#Overview & Goals

This markdown file will be the place where we can conduct our exploratory data analysis.

Below you will find the questions that we are using to explore the data and who is responsible for conducting the data analysis associated with that question.

By the end of this process we will have as a team:

  1. Imported the cleaned dataset from Lab 3

  2. All of the EDA questions examined with clear code & written explanations of the results.

  3. Data visualization for the various EDAs.

    • One question to resolve is - is Lilly creating all the viz or should this be the responsibility the people assigned the questions.
    • Either way we need one coherent theme across all viz to be imported into the presentation
  4. A coherent final presentation (in a separate powerpoint created by Lilly), along with a clear assignment of who is presenting what.

LINK TO SLIDES: https://docs.google.com/presentation/d/1Z7EedURd1mhE6WoUVpifnr4v6w-2GmTj2a6IKPiLOHg/edit?usp=sharing

*General notes: >Some example code for cleaning and data analysis will be pulled into the presentation, so make sure you are annotating code as you go so we can explain what each line does if needed.

keep committing your progress with clear messages so we can see contributions and troubleshoot errors if needed.

#Task Assignment

0.0.1 1. Traffic & Volume Patterns

Lilly is going to be working on this question


0.0.2 2. Bid Behavior & Competitiveness

Edwin is going to be working on this question


0.0.3 3. Auction Outcomes

Bingtang is going to be working on this question


0.0.4 4. Geographic Analysis

Original question looked at impacts of data cleaning on analysis. The data that needed to be cleaned was not a significant portion of the total data set, so there was not a significant change in staticsis of the data before vs after cleaning.


#Setup

##Packages

library(tidyverse)
library(arrow)
library(logger)
library(glue)
library(dplyr)
library(tidyr)
library(rlang)
library(lubridate)
library(tictoc)
library(here)
library(jsonlite)
library(scales)
library(knitr)
library(kableExtra)
library(DT)
library(tigris)   # for zctas()
library(sf)       # for spatial functions
library(stringr)  # for string cleaning
# source(here("src", "data_cleaning.r"))
library(zipcodeR)
library(arrow)
library(ggridges)   # ridge plots
library(tigris)     # US geographic boundaries
library(sf)         # spatial data handling
library(viridis)    # colorblind-safe palettes(maps)
library(mgcv)       # GAM/BAM modeling
library(pROC)

source(here("src", "data_cleaning.r"))  # Functions loaded at runtime - linter warnings are false positives


#Lilly imported these from Lab 3, please feel free to add any additional libraries we might need.

*Note: please try to use tidyverse packages and functions, we want to make sure everyone is familiar with the functions we are using.

#Import the cleaned dataset

#READ THIS BEFORE RUNNING CODE
#Since we cannot "save" the cleaned data in lab 03 please make sure to run this code "saveRDS(bids_clean, here::here("data", "bids_clean.rds"))" at line 1318 and make sure to NOT commit that change, then you can run this code


# bids_clean <- readRDS(here::here("data", "bids_clean.rds"))


data_cleaning_pipeline <- function(df, expected_columns, zip_code_db = NULL, save_path = NULL, verbose = TRUE) {
  df <- clean_price_column(df,
                           min_price = 0,
                           max_price = 10,
                           fix_leading_o = TRUE,
                           verbose = verbose)

  df <- clean_geo_region_column(df,
                                verbose = verbose)

  df <- clean_zip_column(df,
                         zip_code_db = zip_code_db,
                         verbose = verbose)

  df <- clean_response_time_column(df,
                                   col_name = "RESPONSE_TIME",
                                   output_col_name = "RESPONSE_TIME_clean",
                                   extract_digits = TRUE,
                                   verbose = verbose)

  df <- clean_timestamp_column(df,
                               col_name = "TIMESTAMP",
                               verbose = verbose)

  df <- clean_city_column(df,
                          zip_code_db = zip_code_db,
                          verbose = verbose)

  df <- clean_geo_coordinates_column(df,
                                     verbose = verbose)

  df <- clean_bids_won_column(df,
                              verbose = verbose)

  df <- clean_date_column(df,
                          col_name = "DATE_UTC",
                          output_col_name = "DATE_UTC_clean",
                          verbose = verbose)

  df <- clean_device_type_column(df,
                                 col_name = "DEVICE_TYPE",
                                 output_col_name = "DEVICE_TYPE_clean",
                                 verbose = verbose)

  df <- clean_response_time_column(df,
                                   col_name = "RESPONSE_TIME",
                                   output_col_name = "RESPONSE_TIME_clean",
                                   extract_digits = TRUE,
                                   verbose = verbose)

  df <- clean_requested_sizes_column(df,
                                     col_name = "REQUESTED_SIZES",
                                     output_col_name = "REQUESTED_SIZES_clean",
                                     verbose = verbose)

  duplicate_handler <- remove_duplicates(df,
                                         exclude_cols = c("row_id"),
                                         verbose = verbose)
  df <- duplicate_handler[["df"]]
  removed_indices <- duplicate_handler[["removed_indices"]]

  if (!is.null(save_path)) {
    write_parquet(df, save_path)
  }

  return(df)
}


# ---- Timing Start ----
run_time <- system.time({

#------------------------------------------------------
# LOAD EXPECTED BIDS COLUMNS FROM CSV
#------------------------------------------------------
expected_columns <- data.frame(readr::read_csv(
  here::here("src", "expected_columns.csv"),
  col_types = "ccc"
))

#------------------------------------------------------
# LOAD BIDS DATA FROM PARQUET
#------------------------------------------------------
cat("\n")
cat(strrep("=", 70), "\n")
cat("STARTING BIDS DATA PROCESSING\n")
cat(strrep("=", 70), "\n\n")

# Load data
cat("Loading data...\n")
original_bids <- read_parquet(here("data", "bids_data_vDTR.parquet"))
bids <- original_bids %>% mutate(row_id = row_number())
cat(glue("Loaded {nrow(original_bids)} rows and {ncol(original_bids)} columns\n\n"))

#------------------------------------------------------
# LOAD ZIPCODE DATA
#------------------------------------------------------
# Load ZIP → city lookup from zipcodeR
zip_code_db <- load_oregon_zips()

#------------------------------------------------------
# CHECK FOR MISSING COLUMNS
#------------------------------------------------------
missing_columns <- check_columns(bids, expected_columns$column)
cat(glue::glue("There are {length(missing_columns)} missing column(s): \n {paste(missing_columns, collapse = ', ')}"))

#------------------------------------------------------
# TYPE SUMMARY
#------------------------------------------------------
bids_type_summary <- check_column_types(bids, expected_columns)
print(bids_type_summary)

save_path <- NULL
# save_path <- here("data", "bids_data_vDTR_clean.parquet")
bids <- data_cleaning_pipeline(bids, expected_columns, zip_code_db, save_path, verbose = TRUE)

cat("\n")
cat(strrep("=", 70), "\n")
cat("CREATING FINAL CLEANED DATASET\n")
cat(strrep("=", 70), "\n\n")
# Create final cleaned dataset
bids_clean <- bids %>%
  select(
    row_id,
    DATE_UTC_clean,
    TIMESTAMP_clean,
    AUCTION_ID,
    PUBLISHER_ID,
    PRICE_final,
    DEVICE_GEO_REGION_clean,
    DEVICE_GEO_ZIP_clean,
    DEVICE_GEO_CITY_clean,
    DEVICE_GEO_LAT_clean,
    DEVICE_GEO_LONG_clean,
    BID_WON_clean,
    RESPONSE_TIME_clean,
    DEVICE_TYPE_clean,
    SIZE,
    REQUESTED_SIZES_clean
  )
  # %>%
  # rename_with(~ str_remove(., "(_clean|_final)$"))


print(class(bids_clean))
glimpse(bids_clean)
# NA counts per column
na_count_by_col <- colSums(is.na(bids_clean %>% select(-REQUESTED_SIZES_clean)))
cat("\nNA Counts per Column:\n")
cat(strrep("=", 70), "\n")
print(na_count_by_col)
total_na_rows <- sum(!complete.cases(bids_clean %>% select(-REQUESTED_SIZES_clean)))
print(glue("Total NA rows: {total_na_rows}"))

})  # ---- Timing End ----
## 
## ====================================================================== 
## STARTING BIDS DATA PROCESSING
## ====================================================================== 
## 
## Loading data...
## Loaded 443969 rows and 15 columns
## 
##  Loading Oregon ZCTA data 
## Loading Oregon ZCTA data from cached parquet...
## There are 1 missing column(s): 
## DEVICE_GEO_COUNTRY              column    actual Expected_Type match  Notes_Actual_Type
## 1          TIMESTAMP character       POSIXct FALSE      TIMESTAMP_NTZ
## 2           DATE_UTC character          Date FALSE               DATE
## 3         AUCTION_ID character     character  TRUE            VARCHAR
## 4       PUBLISHER_ID character     character  TRUE            VARCHAR
## 5        DEVICE_TYPE   integer     character FALSE            VARCHAR
## 6  DEVICE_GEO_REGION character     character  TRUE         VARCHAR(2)
## 7    DEVICE_GEO_CITY character     character  TRUE            VARCHAR
## 8     DEVICE_GEO_ZIP character     character  TRUE        VARCHAR(10)
## 9     DEVICE_GEO_LAT   numeric       numeric  TRUE              FLOAT
## 10   DEVICE_GEO_LONG   numeric       numeric  TRUE              FLOAT
## 11   REQUESTED_SIZES character          list FALSE VARCHAR (or ARRAY)
## 12              SIZE character     character  TRUE            VARCHAR
## 13             PRICE character       numeric FALSE       NUMBER(12,6)
## 14     RESPONSE_TIME character       integer FALSE       NUMBER(10,0)
## 15           BID_WON character       logical FALSE            BOOLEAN
## 
##  ============================================================ 
## Converting PRICE to numeric 
## ============================================================ 
## Applying preprocessing...
## Found 1 non-numeric value(s), attempting to fix... 
## Converting PRICE... 
## PRICE_clean is now: numeric 
## 
## NA COUNT: 
## There are 0 NAs in PRICE_clean 
## 
## 
##  ============================================================ 
## Cleaning DEVICE_GEO_REGION column 
## ============================================================ 
## Current values in DEVICE_GEO_REGION:
##     Or     OR oregon    xor   <NA> 
##  53689 333826  41513  14941      0 
## # A tibble: 3 × 2
##   region_lower      n
##   <chr>         <int>
## 1 or           387515
## 2 oregon        41513
## 3 xor           14941
## 
##  Number of NA values in DEVICE_GEO_REGION_clean: 0 
## 
##  ============================================================ 
## Cleaning DEVICE_GEO_ZIP column 
## ============================================================ 
## Current values in DEVICE_GEO_ZIP:
## -------------------------------------------------- 
## ZIP CODE RECOVERY REPORT
## -------------------------------------------------- 
##   Original missing (NA): 21198
##   Original sentinels:    18
##   Total bad ZIPs:        21216
##   Spatial join matches:  440849 (points matched to ZCTA polygons)
##   Recovered ZIPs:        21196
##   Remaining NA:          20
##   Recovery rate:         99.9%
## -------------------------------------------------- 
## 
## 
##  ============================================================ 
## Converting RESPONSE_TIME to integer 
## ============================================================ 
## Applying preprocessing...
## Extracting digits from string... 
## Converting RESPONSE_TIME... 
## RESPONSE_TIME_clean is now: integer 
## 
## NA COUNT: 
## There are 0 NAs in RESPONSE_TIME_clean 
## 
## 
##  ============================================================ 
## Converting TIMESTAMP_clean to POSIXct 
## ============================================================ 
## Converting TIMESTAMP_clean... 
## TIMESTAMP_clean is now: POSIXct 
## 
## NA COUNT: 
## There are 0 NAs in TIMESTAMP_clean 
## 
## 
##  ============================================================ 
## Cleaning DEVICE_GEO_CITY column 
## ============================================================ 
## 
## -------------------------------------------------- 
## CITY RECOVERY REPORT
## -------------------------------------------------- 
##   Original missing:  21198
##   Recovered via ZIP: 21196
##   Remaining NA:      2
## -------------------------------------------------- 
##   Unmatched ZIPs: 0 unique values
##   Top unmatched ZIPs:
## # A tibble: 0 × 2
## # ℹ 2 variables: DEVICE_GEO_ZIP_clean <chr>, n <int>
## 
##  ============================================================ 
## Cleaning DEVICE_GEO_LAT and DEVICE_GEO_LONG columns 
## ============================================================ 
## Latitudes are consistent with Oregon.
## Longitudes include locations outside Oregon.
## Number of implausible coordinates: 100 
## 
##  ============================================================ 
## Cleaning BID_WON column 
## ============================================================ 
## Current values in BID_WON:
## 
##  FALSE   true   TRUE   <NA> 
## 323285     10 120674      0 
## 
## Current values in BID_WON_clean:
##  FALSE   TRUE   <NA> 
## 323285 120684      0 
## 
##  ============================================================ 
## Converting DATE_UTC to Date 
## ============================================================ 
## Converting DATE_UTC... 
## DATE_UTC_clean is now: Date 
## 
## NA COUNT: 
## There are 0 NAs in DATE_UTC_clean 
## 
## 
##  ============================================================ 
## Converting DEVICE_TYPE to character 
## ============================================================ 
## Converting DEVICE_TYPE... 
## DEVICE_TYPE_clean is now: character 
## 
## NA COUNT: 
## There are 0 NAs in DEVICE_TYPE_clean 
## 
## 
##  ============================================================ 
## Converting RESPONSE_TIME to integer 
## ============================================================ 
## Applying preprocessing...
## Extracting digits from string... 
## Converting RESPONSE_TIME... 
## RESPONSE_TIME_clean is now: integer 
## 
## NA COUNT: 
## There are 0 NAs in RESPONSE_TIME_clean 
## 
## 
##  ============================================================ 
## Converting REQUESTED_SIZES to list 
## ============================================================ 
## Parsing JSON elements... 
## REQUESTED_SIZES is now: list 
## 
## 
##  ============================================================ 
## Removing duplicate rows 
## ============================================================ 
## Removed 2434 duplicate rows. 
## Remaining rows: 441535 
## 
## ====================================================================== 
## CREATING FINAL CLEANED DATASET
## ====================================================================== 
## 
## [1] "tbl_df"     "tbl"        "data.frame"
## Rows: 441,535
## Columns: 16
## $ row_id                  <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
## $ DATE_UTC_clean          <date> 2025-10-21, 2025-10-21, 2025-10-21, 2025-10-2…
## $ TIMESTAMP_clean         <dttm> 2025-10-21 23:42:37, 2025-10-21 23:42:37, 202…
## $ AUCTION_ID              <chr> "0000060c-b8a9-414b-aeae-5f841472d6bb", "00000…
## $ PUBLISHER_ID            <chr> "LteIcOiSsaE5", "LteIcOiSsaE5", "LteIcOiSsaE5"…
## $ PRICE_final             <dbl> 0.04000000, 0.06728000, 0.23000000, 0.04475100…
## $ DEVICE_GEO_REGION_clean <chr> "OR", "OR", "OR", "OR", "OR", "OR", "OR", "OR"…
## $ DEVICE_GEO_ZIP_clean    <chr> "97302", "97302", "97302", "97302", "97302", "…
## $ DEVICE_GEO_CITY_clean   <chr> "Salem", "Salem", "Salem", "Salem", "Salem", "…
## $ DEVICE_GEO_LAT_clean    <dbl> 44.9036, 44.9036, 44.9036, 44.9036, 44.9036, 4…
## $ DEVICE_GEO_LONG_clean   <dbl> -123.0461, -123.0461, -123.0461, -123.0461, -1…
## $ BID_WON_clean           <chr> "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "F…
## $ RESPONSE_TIME_clean     <int> 259, 86, 80, 259, 461, 252, 361, 173, 178, 92,…
## $ DEVICE_TYPE_clean       <chr> "4", "4", "4", "4", "4", "1", "1", "1", "1", "…
## $ SIZE                    <chr> "320x50", "320x50", "320x50", "320x50", "320x5…
## $ REQUESTED_SIZES_clean   <list> <"320x50", "300x50">, <"320x50", "300x50">, <…
## 
## NA Counts per Column:
## ====================================================================== 
##                  row_id          DATE_UTC_clean         TIMESTAMP_clean 
##                       0                       0                       0 
##              AUCTION_ID            PUBLISHER_ID             PRICE_final 
##                       0                       0                     452 
## DEVICE_GEO_REGION_clean    DEVICE_GEO_ZIP_clean   DEVICE_GEO_CITY_clean 
##                       0                      20                       2 
##    DEVICE_GEO_LAT_clean   DEVICE_GEO_LONG_clean           BID_WON_clean 
##                       0                     100                       0 
##     RESPONSE_TIME_clean       DEVICE_TYPE_clean                    SIZE 
##                       0                       0                       0 
## Total NA rows: 570
cat(glue::glue("\n\nTotal runtime for data cleaning: {round(run_time[['elapsed']], 2)} seconds\n"))
## 
## Total runtime for data cleaning: 30.08 seconds
glimpse(bids_clean)
## Rows: 441,535
## Columns: 16
## $ row_id                  <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
## $ DATE_UTC_clean          <date> 2025-10-21, 2025-10-21, 2025-10-21, 2025-10-2…
## $ TIMESTAMP_clean         <dttm> 2025-10-21 23:42:37, 2025-10-21 23:42:37, 202…
## $ AUCTION_ID              <chr> "0000060c-b8a9-414b-aeae-5f841472d6bb", "00000…
## $ PUBLISHER_ID            <chr> "LteIcOiSsaE5", "LteIcOiSsaE5", "LteIcOiSsaE5"…
## $ PRICE_final             <dbl> 0.04000000, 0.06728000, 0.23000000, 0.04475100…
## $ DEVICE_GEO_REGION_clean <chr> "OR", "OR", "OR", "OR", "OR", "OR", "OR", "OR"…
## $ DEVICE_GEO_ZIP_clean    <chr> "97302", "97302", "97302", "97302", "97302", "…
## $ DEVICE_GEO_CITY_clean   <chr> "Salem", "Salem", "Salem", "Salem", "Salem", "…
## $ DEVICE_GEO_LAT_clean    <dbl> 44.9036, 44.9036, 44.9036, 44.9036, 44.9036, 4…
## $ DEVICE_GEO_LONG_clean   <dbl> -123.0461, -123.0461, -123.0461, -123.0461, -1…
## $ BID_WON_clean           <chr> "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "F…
## $ RESPONSE_TIME_clean     <int> 259, 86, 80, 259, 461, 252, 361, 173, 178, 92,…
## $ DEVICE_TYPE_clean       <chr> "4", "4", "4", "4", "4", "1", "1", "1", "1", "…
## $ SIZE                    <chr> "320x50", "320x50", "320x50", "320x50", "320x5…
## $ REQUESTED_SIZES_clean   <list> <"320x50", "300x50">, <"320x50", "300x50">, <…

#Run Exploratory Data Analysis

0.0.5 1. Traffic & Volume Patterns

*How does bidding volume change across hours of the day and days of the week?

# Extract hour and day of week
bids_clean <- bids_clean %>%
  mutate(
    hour = hour(TIMESTAMP_clean),
    day_of_week = wday(TIMESTAMP_clean, label = TRUE, abbr = FALSE, week_start = 1),

    day_of_week = factor(day_of_week,
                         levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
                         ordered = FALSE)
  )

# Summarize bid volume by hour
hourly_volume <- bids_clean %>%
  count(hour)


all_days <- factor(
  c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
  levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
  ordered = FALSE
)

# Summarize bid volume by day of week (include 0s)
daily_volume <- bids_clean %>%
  count(day_of_week) %>%
  complete(day_of_week = all_days, fill = list(n = 0))

# Plot: Bidding volume by hour
ggplot(hourly_volume, aes(x = hour, y = n, fill = as.factor(hour))) +
  geom_col(show.legend = FALSE) +
  scale_x_continuous(
    breaks = 0:23,
    labels = format(strptime(0:23, format = "%H"), format = "%I %p")
  ) +
  scale_fill_viridis_d(option = "cividis") +
  labs(
    title = "Bidding Volume by Hour of Day",
    x = "",
    y = "# of Bids"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Plot: Bidding volume by day of week
ggplot(daily_volume, aes(x = day_of_week, y = n, fill = day_of_week)) +
  geom_col(show.legend = FALSE) +
  scale_y_continuous(labels = label_comma()) +
  scale_fill_viridis_d(option = "cividis") +
  labs(
    title = "Bidding Volume by Day of Week",
    x = "",
    y = "# of Bids"
  ) +
  theme_minimal()

Comments: At first I thought there was something wrong since we are only seeing results from a Tuesday and Wednesday, but after investigation, the original data set given to use only has data from 10/21/2025 and 10/22/2025, so this makes sense. This will limit the conclusions we can draw from the data, but it is not error.

Note: For the plots I intentionally added a minimal theme and made sure that the color schemes added were color-blind friendly and accessible. Specifically “cividis” is most optimized for all types of vision, including grayscale.

*When are the peak bidding periods, and what might explain those spikes?

# Identify top 3 bidding hours
top_hours <- hourly_volume %>%
  top_n(3, n) %>%
  arrange(desc(n))

# Identify peak day(s)
top_days <- daily_volume %>%
  top_n(1, n)

print(top_hours)
## # A tibble: 3 × 2
##    hour     n
##   <int> <int>
## 1     2 67111
## 2     3 61514
## 3     1 60233
print(top_days)
## # A tibble: 1 × 2
##   day_of_week      n
##   <fct>        <int>
## 1 Wednesday   273779

The top three peak bidding hours are 2am, 3am, and 1am with 67,111 bids, 61,514 bids, and 60,233 bids respectively. These early hour spikes suggest that the bidding system might be automated or operating across multiple timezones (if bidders are from outside of the PST), but that is not the case here as all the bids come from OR. Another explanation could be backlogged jobs. Since it is very unlikely that these bids come from humans at these early/late hours it points to scheduled systems or ad bots.

Wednesday had the highest volume with 273,779 bids, but again we only have data from two days of the week so that does not mean much here.

0.0.6 2. Bid Behavior & Competitiveness

0.1 Price difference between winning and losing bids

# Ridgeline plot of price distributions for won vs lost bids

ggplot(bids_clean, aes(y = BID_WON_clean, x = PRICE_final, fill = BID_WON_clean)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2) +      # ridge density curves
  scale_fill_manual(
    values = c("FALSE" = "#D55E00", "TRUE" = "#009E73"), # custom colors
    labels = c("Lost Bid", "Won Bid")                    # legend labels
  ) +
  labs(
    title = "Bid Price Distribution by Bid Outcome",      # plot title
    subtitle = "Ridgeline plot highlights distribution shapes and overlap",
    x = "Bid Price",                                      # x-axis label
    y = "Bid Outcome"                                     # y-axis label
  ) +
  scale_y_discrete(labels = c("FALSE" = "Lost Bids", "TRUE" = "Won Bids")) +
  scale_fill_viridis_d(option = "cividis") +
  theme_minimal(base_size = 13) +                         # clean theme
  theme(plot.title = element_text(face = "bold"))         # bold title

0.1.1 Figure summary

The ridgeline plot gives a clear picture of how bid prices differ between winning and losing bids. Most losing bids are packed tightly at the lower end of the price range, while winning bids tend to extend into slightly higher prices. There is some overlap at very low bid amounts, but overall the curve for winning bids stretches farther to the right. This pattern suggests that higher bid prices generally improve the chances of winning.

0.2 Violin - Boxplot bid outcome representation

# Violin + Boxplot comparing bid prices for wins vs losses

ggplot(bids_clean, aes(x = BID_WON_clean, y = PRICE_final, fill = BID_WON_clean)) +
  geom_violin(trim = FALSE, alpha = 0.6) +                 # violin shows price distribution shape
  geom_boxplot(width = 0.15, outlier.shape = NA, alpha = 0.9) +  # boxplot shows median + IQR
  scale_fill_manual(
    values = c("FALSE" = "#D55E00", "TRUE" = "#009E73"),   # custom colors
    labels = c("Lost Bid", "Won Bid"),                     # legend labels
    name = "Bid Outcome"
  ) +
  labs(
    title = "Distribution of Bid Prices by Bid Outcome",    # plot title
    subtitle = "Violin + Boxplot highlight price differences",
    x = "Bid Outcome",
    y = "Bid Price"
  ) +
  scale_x_discrete(labels = c("FALSE" = "Lost Bids", "TRUE" = "Won Bids")) +
   scale_fill_viridis_d(option = "cividis") +
  theme_minimal(base_size = 13) +                           # clean theme
  theme(
    plot.title = element_text(face = "bold", size = 15),    # bold title
    strip.text = element_text(face = "bold"),
    legend.position = "none"                                # hide legend (labels on x-axis)
  )

0.2.1 Figure summary

The violin–boxplot shows clear differences in how bid prices behave for winning and losing bids. Losing bids are mostly concentrated at very low prices, with only a few stretching upward. In contrast, winning bids tend to be higher overall, with a wider spread and a noticeably higher median. While there is still some overlap at the lower end, the shape of the distributions shows that higher bid prices are much more common among winning bids. Overall, the plot suggests that bidding slightly more greatly improves the odds of winning.

0.4 Publishers spatial price representation

# Hexbin map showing spatial bid patterns by advertiser

ggplot(bids_clean, aes(DEVICE_GEO_LONG_clean, DEVICE_GEO_LAT_clean)) +
  stat_summary_hex(aes(z = PRICE_final), fun = mean) +   # hexbin with mean price per hex
  scale_fill_viridis_c(trans = "log",                    # color scale on log scale for contrast
                       name = "Avg Bid Price (log scale)") +
  coord_equal() +                                        # preserve geographic proportions
  facet_wrap(~ PUBLISHER_ID) +                           # separate map panel per advertiser
  scale_fill_viridis_c(
    option = "cividis",        # ← use cividis for continuous scale
    trans = "log",
    name = "Avg Bid Price\n(log scale)"
  )  +
  labs(
    title = "Spatial Patterns of Publisher Bid Prices Across Oregon",
    x = "Longitude",
    y = "Latitude"
  )

0.4.1 Plot summary

The faceted map shows how public vary not only in the prices they bid but also in where those bids appear across Oregon. Some publishers have activity spread broadly throughout the state, while others are concentrated in just a few regions. Within each publisher’s panel, the color shading highlights differences in bid intensity: lighter areas represent higher average prices, while darker areas reflect lower prices. The patterns suggest that publishers target different geographic areas and may adjust their bidding strategies depending on where users are located. Thus the map reveals distinct spatial footprints for each publisher, with both bidding levels and geographic focus varying considerably from one publisher to another.

0.5 Cleaning NAs for modeling

# Clean binary column from the original character ones
bids_clean$BID_WON_clean <- ifelse(bids_clean$BID_WON_clean == "TRUE", 1,
                                   ifelse(bids_clean$BID_WON_clean == "FALSE", 0, NA))

# drop NAs
bids_clean <- bids_clean %>%
  filter(
    !is.na(BID_WON_clean),
    !is.na(PRICE_final),
    !is.na(DEVICE_GEO_LONG_clean),
    !is.na(DEVICE_GEO_LAT_clean)
  )

0.6 predicting win probability from price and location

set.seed(123)   # ensure the train/test split is reproducible

# Determine number of rows in the dataset
n <- nrow(bids_clean)

# Randomly select 60% of rows for training
train_index <- sample(seq_len(n), size = 0.6 * n)

# Split the dataset into training and testing sets
train_data <- bids_clean[train_index, ]   # 60% training data
test_data  <- bids_clean[-train_index, ]  # remaining 40% test data

# Fit a BAM model (fast GAM) to predict win probability
model <- bam(
  BID_WON_clean ~
    s(PRICE_final, k = 10) +                            # smooth effect of bid price
    te(DEVICE_GEO_LONG_clean, DEVICE_GEO_LAT_clean,     # 2D smooth for spatial effects
       k = c(8, 8)),
  data = train_data,                                     # training dataset
  family = binomial(),                                   # binary outcome (win or lose)
  discrete = TRUE                                        # speed optimization for big data
)

summary(model)   # show model fit statistics and significance
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## BID_WON_clean ~ s(PRICE_final, k = 10) + te(DEVICE_GEO_LONG_clean, 
##     DEVICE_GEO_LAT_clean, k = c(8, 8))
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.81966    0.09377   19.41   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                                   edf Ref.df Chi.sq p-value    
## s(PRICE_final)                                  8.833  8.983  65469  <2e-16 ***
## te(DEVICE_GEO_LONG_clean,DEVICE_GEO_LAT_clean) 49.184 53.880   1445  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.345   Deviance explained = 30.2%
## fREML = 3.5153e+05  Scale est. = 1         n = 264589

0.6.1 Model Interpretation

The model shows a strong and clear relationship between how much an advertiser bids and their chances of winning the auction. The smooth term for bid amount (s(PRICE_final)) is highly significant (p < 2e-16), meaning changes in bid price have a meaningful effect on the probability of winning. In general, higher bids increase the likelihood of winning, which aligns with how auction systems typically function.

The spatial smooth term (s(longitude, latitude)) is also significant, indicating that location matters as well—some geographic areas are naturally more competitive than others.

The model explains about 30% of the deviance and has an adjusted R-squired of 0.345, which is reasonable for behavioral auction data. Overall, the results suggest that while bid amount is a major driver of winning probability, location-based competition also plays an important role.

0.7 Model dicrimination accuracy

# Predict on test data
test_data$pred_prob <- predict(model, newdata = test_data, type = "response")

# Compute AUC
roc_obj <- roc(test_data$BID_WON_clean, test_data$pred_prob)
auc_value <- auc(roc_obj)

print(auc_value)
## Area under the curve: 0.86

0.7.1 AUC Interpretation

The AUC of 0.86 shows that our model is very effective at separating winning and losing bids, about 86% of the time, it assigns a higher win probability to an actual winner than to a loser. This indicates strong predictive power, meaning the model is reliable for estimating competitive bid prices, identifying locations where bids tend to perform better or worse, and predicting the likelihood of winning for new bids.

0.8 Trend to our predicted model

# showing price in relation to probability of win or loss a bid

ggplot(test_data, aes(x = PRICE_final, y = pred_prob, color = pred_prob)) +
  geom_point(alpha = 0.3) +
  geom_smooth(color = "black", se = FALSE, size = 1.2) +
  scale_color_viridis_c(option = "cividis") +
  labs(
    title = "Predicted Win Probability vs Bid Price",
    x = "Bid Price",
    y = "Predicted Win Probability",
    color = "Win Prob"
  ) +
  theme_minimal(base_size = 13)

0.8.1 Figure summary

The figure shows how win probability changes with bid price. Win chances rise quickly at low prices, meaning small increases make a big difference. As prices move higher, the improvement slows, and eventually the probability levels off, suggesting that once a bid is competitive, raising the price further adds little benefit.

0.9 spatial representation of win probability across cities

# Predict win probability using your trained model
bids_clean$pred_prob <- predict(
  model,
  newdata = bids_clean,
  type = "response"
)

# Prepare data for hex plotting
bids_pred_hex <- bids_clean %>%
  mutate(
    DEVICE_GEO_LONG_clean = as.numeric(DEVICE_GEO_LONG_clean),
    DEVICE_GEO_LAT_clean  = as.numeric(DEVICE_GEO_LAT_clean)
  ) %>%
  filter(
    is.finite(DEVICE_GEO_LONG_clean),
    is.finite(DEVICE_GEO_LAT_clean),
    is.finite(pred_prob)
  )

# Oregon state outline
or_state <- states(cb = TRUE, year = 2023, class = "sf") %>%
  filter(STUSPS == "OR") %>%
  st_transform(4326)

# Counties shapefile for Oregon
or_counties <- tigris::counties(state = "OR", cb = TRUE, year = 2023, class = "sf") %>%
  sf::st_transform(4326)

# Plot Model-Predicted Win Probability

ggplot() +
  # Oregon boundary
  geom_sf(data = or_state, fill = "gray95", color = "gray50", linewidth = 0.4) +
  # Oregon counties
  geom_sf(data = or_counties, fill = NA, color = "gray60", linewidth = 0.3) +
  # Predicted win-probability hex map
  stat_summary_hex(
    data = bids_pred_hex,
    aes(
      x = DEVICE_GEO_LONG_clean,
      y = DEVICE_GEO_LAT_clean,
      z = pred_prob
    ),
    fun  = mean,
    bins = 50,
    alpha = 0.85
  ) +

  # Color scale for predictions
  scale_fill_viridis_c(
    option = "plasma",
    name   = "Predicted\nWin Probability",
    limits = c(0, 1)
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Predicted Win Probability Across Oregon Cities",
    subtitle = "Spatial representation using Big Additive Model probability estimates",
    x = "Longitude",
    y = "Latitude"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),
    plot.title = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(size = 11),
    legend.position = "right"
  )

# convert back to char
bids_clean$BID_WON_clean <- ifelse(bids_clean$BID_WON_clean == 1, "TRUE", "FALSE")

0.9.1 Plot Summary

The map shows how predicted win probabilities vary across Oregon. Higher probabilities cluster around major populated areas, especially in the northwest region, while many rural areas show lower or more scattered win chances. This suggests that auction competitiveness and bidding dynamics differ by location, with some cities consistently offering more favorable conditions for winning bids than others.

0.9.2 3. Auction Outcomes

#Library
library(dplyr)
library(ggplot2)
library(tidyr)
library(corrplot)
library(lubridate)
library(forcats)
library(gridExtra)
library(ggrepel)

#1. Which features are most predictive of a win?

# Define Cramér's V function
cramers_v <- function(x, y) {
  # Create contingency table
  confusion_matrix <- table(x, y)

  # Calculate chi-squared test
  chi2 <- chisq.test(confusion_matrix)

  # Get dimensions and sample size
  n <- sum(confusion_matrix)
  k <- min(dim(confusion_matrix))

  # Calculate Cramér's V
  v <- sqrt(chi2$statistic / (n * (k - 1)))

  return(as.numeric(v))
}

# Now run the categorical analysis again
cat("\n")
cat(strrep("=", 70), "\n")
## ======================================================================
cat("CATEGORICAL VARIABLE ANALYSIS WITH BID_WON\n")
## CATEGORICAL VARIABLE ANALYSIS WITH BID_WON
cat(strrep("=", 70), "\n\n")
## ======================================================================
# Define categorical variables for analysis
categorical_vars <- c("DEVICE_TYPE_clean", "DEVICE_GEO_REGION_clean",
                     "DEVICE_GEO_CITY_clean", "SIZE", "DAY_OF_WEEK")

# Prepare the data for correlation analysis
bids_analysis <- bids_clean %>%
  # Convert BID_WON to numeric (0/1)
  mutate(
    BID_WON_numeric = as.numeric(as.logical(BID_WON_clean)),

    # Extract date/time features
    HOUR = hour(TIMESTAMP_clean),
    DAY_OF_WEEK = wday(DATE_UTC_clean, label = TRUE, week_start = 1),
    DAY_OF_WEEK_NUM = as.numeric(DAY_OF_WEEK),
    MONTH = month(DATE_UTC_clean),
    DAY = day(DATE_UTC_clean),

    # Convert DEVICE_TYPE to factor
    DEVICE_TYPE_fct = as.factor(DEVICE_TYPE_clean),

    # Create ad size area from SIZE column
    SIZE_WIDTH = as.numeric(str_extract(SIZE, "^[0-9]+")),
    SIZE_HEIGHT = as.numeric(str_extract(SIZE, "[0-9]+$")),
    SIZE_AREA = SIZE_WIDTH * SIZE_HEIGHT,

    # Extract number of requested sizes
    NUM_REQUESTED_SIZES = map_int(REQUESTED_SIZES_clean, length),

    # Create region indicator
    REGION_OR = as.numeric(DEVICE_GEO_REGION_clean == "OR"),

    # Log-transform price for better distribution
    PRICE_log = log(PRICE_final + 0.001),

    # Log-transform response time
    RESPONSE_TIME_log = log(RESPONSE_TIME_clean + 1)
  )

categorical_results <- list()

for (var in categorical_vars) {
  if (var %in% names(bids_analysis)) {
    # Remove NA values for this analysis
    temp_data <- bids_analysis %>%
      select(!!sym(var), BID_WON_numeric) %>%
      filter(!is.na(!!sym(var))) %>%
      na.omit()

    if(nrow(temp_data) > 0 && length(unique(temp_data[[var]])) > 1) {
      # Calculate win rates
      win_rates <- temp_data %>%
        group_by(!!sym(var)) %>%
        summarise(
          count = n(),
          win_rate = mean(BID_WON_numeric, na.rm = TRUE),
          .groups = "drop"
        ) %>%
        arrange(desc(win_rate))

      # Calculate Cramér's V with error handling
      tryCatch({
        cramer_v <- cramers_v(temp_data[[var]], temp_data$BID_WON_numeric)

        cat("Variable:", var, "\n")
        cat("Cramér's V:", round(cramer_v, 4), "\n")
        cat("Sample size:", nrow(temp_data), "\n")
        cat("Unique categories:", length(unique(temp_data[[var]])), "\n")

        if(nrow(win_rates) <= 10) {
          cat("All categories:\n")
          print(win_rates)
        } else {
          cat("Top 5 categories by win rate:\n")
          print(head(win_rates, 5))
          cat("\nBottom 5 categories by win rate:\n")
          print(tail(win_rates, 5))
        }

        # Store results
        categorical_results[[var]] <- list(
          cramer_v = cramer_v,
          win_rates = win_rates,
          n = nrow(temp_data),
          n_categories = length(unique(temp_data[[var]]))
        )

      }, error = function(e) {
        cat("Variable:", var, "\n")
        cat("Could not calculate Cramér's V:", e$message, "\n")
      })

      cat("\n")
      cat(strrep("-", 50), "\n\n")
    } else {
      cat("Variable:", var, "\n")
      cat("Insufficient data or only one category\n\n")
      cat(strrep("-", 50), "\n\n")
    }
  }
}
## Variable: DEVICE_TYPE_clean 
## Cramér's V: 0.1328 
## Sample size: 440983 
## Unique categories: 5 
## All categories:
## # A tibble: 5 × 3
##   DEVICE_TYPE_clean  count win_rate
##   <chr>              <int>    <dbl>
## 1 2                  24583    0.474
## 2 0                   3694    0.370
## 3 1                 228990    0.289
## 4 5                    318    0.286
## 5 4                 183398    0.223
## 
## -------------------------------------------------- 
## 
## Variable: DEVICE_GEO_REGION_clean 
## Insufficient data or only one category
## 
## --------------------------------------------------
## Variable: DEVICE_GEO_CITY_clean 
## Cramér's V: 0.0927 
## Sample size: 440983 
## Unique categories: 202 
## Top 5 categories by win rate:
## # A tibble: 5 × 3
##   DEVICE_GEO_CITY_clean count win_rate
##   <chr>                 <int>    <dbl>
## 1 Canyon City               1        1
## 2 Columbia City             1        1
## 3 Drewsey                   1        1
## 4 Government Camp           1        1
## 5 Weston                    1        1
## 
## Bottom 5 categories by win rate:
## # A tibble: 5 × 3
##   DEVICE_GEO_CITY_clean count win_rate
##   <chr>                 <int>    <dbl>
## 1 North Powder             84    0.119
## 2 Oakridge                 60    0.117
## 3 Prospect                  9    0.111
## 4 O'Brien                  27    0.111
## 5 Elmira                  100    0.1  
## 
## --------------------------------------------------
## Variable: SIZE 
## Cramér's V: 0.1471 
## Sample size: 440983 
## Unique categories: 38 
## Top 5 categories by win rate:
## # A tibble: 5 × 3
##   SIZE      count win_rate
##   <chr>     <int>    <dbl>
## 1 1080x1080     2        1
## 2 1080x566      1        1
## 3 1140x635      3        1
## 4 320x106       2        1
## 5 750x570       1        1
## 
## Bottom 5 categories by win rate:
## # A tibble: 5 × 3
##   SIZE      count win_rate
##   <chr>     <int>    <dbl>
## 1 300x251     319   0.0658
## 2 1140x600      1   0     
## 3 1280x1280     1   0     
## 4 970x500       9   0     
## 5 970x66        1   0     
## 
## --------------------------------------------------
## Variable: DAY_OF_WEEK 
## Cramér's V: NaN 
## Sample size: 440983 
## Unique categories: 2 
## All categories:
## # A tibble: 2 × 3
##   DAY_OF_WEEK  count win_rate
##   <ord>        <int>    <dbl>
## 1 Wed         273426    0.273
## 2 Tue         167557    0.272
## 
## --------------------------------------------------
# Let's also calculate some simpler metrics for categorical variables
cat("\n")
cat(strrep("=", 70), "\n")
## ======================================================================
cat("ALTERNATIVE CATEGORICAL ANALYSIS: RELATIVE WIN RATES\n")
## ALTERNATIVE CATEGORICAL ANALYSIS: RELATIVE WIN RATES
cat(strrep("=", 70), "\n\n")
## ======================================================================
# For each categorical variable, calculate the range of win rates
for (var in categorical_vars) {
  if (var %in% names(bids_analysis)) {
    temp_data <- bids_analysis %>%
      select(!!sym(var), BID_WON_numeric) %>%
      filter(!is.na(!!sym(var))) %>%
      na.omit()

    if(nrow(temp_data) > 0 && length(unique(temp_data[[var]])) > 1) {
      win_summary <- temp_data %>%
        group_by(!!sym(var)) %>%
        summarise(
          count = n(),
          win_rate = mean(BID_WON_numeric, na.rm = TRUE),
          .groups = "drop"
        ) %>%
        filter(count > 10)  # Only consider categories with enough data

      if(nrow(win_summary) > 1) {
        win_range <- max(win_summary$win_rate) - min(win_summary$win_rate)
        best_category <- win_summary[which.max(win_summary$win_rate), ]
        worst_category <- win_summary[which.min(win_summary$win_rate), ]

        cat("Variable:", var, "\n")
        cat("Win rate range:", round(win_range, 4), "\n")
        cat("Best category:", best_category[[1]],
            "(win rate:", round(best_category$win_rate, 4),
            ", n:", best_category$count, ")\n")
        cat("Worst category:", worst_category[[1]],
            "(win rate:", round(worst_category$win_rate, 4),
            ", n:", worst_category$count, ")\n")
        cat("Relative difference:", round(best_category$win_rate / worst_category$win_rate, 2), "x\n")
        cat("\n")
      }
    }
  }
}
## Variable: DEVICE_TYPE_clean 
## Win rate range: 0.2506 
## Best category: 2 (win rate: 0.4739 , n: 24583 )
## Worst category: 4 (win rate: 0.2233 , n: 183398 )
## Relative difference: 2.12 x
## 
## Variable: DEVICE_GEO_CITY_clean 
## Win rate range: 0.6143 
## Best category: Jordan Valley (win rate: 0.7143 , n: 14 )
## Worst category: Elmira (win rate: 0.1 , n: 100 )
## Relative difference: 7.14 x
## 
## Variable: SIZE 
## Win rate range: 0.7913 
## Best category: 320x107 (win rate: 0.8571 , n: 14 )
## Worst category: 300x251 (win rate: 0.0658 , n: 319 )
## Relative difference: 13.02 x
## 
## Variable: DAY_OF_WEEK 
## Win rate range: 0.0015 
## Best category: 3 (win rate: 0.2735 , n: 273426 )
## Worst category: 2 (win rate: 0.2719 , n: 167557 )
## Relative difference: 1.01 x
# Let's create a comprehensive summary table of all features
cat("\n")
cat(strrep("=", 70), "\n")
## ======================================================================
cat("COMPREHENSIVE FEATURE IMPORTANCE SUMMARY\n")
## COMPREHENSIVE FEATURE IMPORTANCE SUMMARY
cat(strrep("=", 70), "\n\n")
## ======================================================================
# Collect all metrics in one data frame
feature_importance_summary <- data.frame()

# 1. Add numerical features from correlation
# First, create the numerical features dataset
numerical_features_clean <- bids_analysis %>%
  select(
    BID_WON_numeric,
    PRICE_final,
    PRICE_log,
    RESPONSE_TIME_clean,
    RESPONSE_TIME_log,
    SIZE_WIDTH,
    SIZE_HEIGHT,
    SIZE_AREA,
    NUM_REQUESTED_SIZES,
    HOUR,
    DAY_OF_WEEK_NUM,
    DAY,
    MONTH,
    DEVICE_GEO_LAT_clean,
    DEVICE_GEO_LONG_clean,
    REGION_OR
  ) %>%
  # Remove rows with any missing values for correlation
  na.omit() %>%
  # Remove infinite values if they exist
  mutate(across(everything(), ~ ifelse(is.infinite(.), NA, .))) %>%
  na.omit()

# Calculate correlation matrix
cor_matrix <- cor(numerical_features_clean, use = "complete.obs", method = "pearson")

# Get correlation with BID_WON
bid_won_correlations <- cor_matrix["BID_WON_numeric", ]
bid_won_cor_sorted <- sort(abs(bid_won_correlations), decreasing = TRUE)

# Create a nice formatted dataframe
bid_won_cor_df <- data.frame(
  Feature = names(bid_won_cor_sorted),
  Correlation = round(bid_won_correlations[names(bid_won_cor_sorted)], 4),
  Absolute_Correlation = round(bid_won_cor_sorted, 4)
) %>%
  filter(Feature != "BID_WON_numeric")  # Remove self-correlation


for (feature in bid_won_cor_df$Feature) {
  if(feature != "BID_WON_numeric") {
    cor_value <- bid_won_cor_df$Correlation[bid_won_cor_df$Feature == feature]
    feature_importance_summary <- rbind(feature_importance_summary,
                                        data.frame(
                                          Feature = feature,
                                          Type = "Numerical",
                                          Metric = "Pearson Correlation",
                                          Value = abs(cor_value),
                                          Direction = ifelse(cor_value > 0, "Positive", "Negative"),
                                          Importance_Level = case_when(
                                            abs(cor_value) >= 0.3 ~ "High",
                                            abs(cor_value) >= 0.1 ~ "Medium",
                                            TRUE ~ "Low"
                                          )
                                        ))
  }
}

# 2. Add categorical features from Cramér's V
for (var in names(categorical_results)) {
  feature_importance_summary <- rbind(feature_importance_summary,
                                      data.frame(
                                        Feature = var,
                                        Type = "Categorical",
                                        Metric = "Cramér's V",
                                        Value = categorical_results[[var]]$cramer_v,
                                        Direction = "Variable",
                                        Importance_Level = case_when(
                                          categorical_results[[var]]$cramer_v >= 0.3 ~ "High",
                                          categorical_results[[var]]$cramer_v >= 0.1 ~ "Medium",
                                          TRUE ~ "Low"
                                        )
                                      ))
}

# Sort by importance
feature_importance_summary <- feature_importance_summary %>%
  arrange(desc(Value))

cat("ALL FEATURES RANKED BY PREDICTIVE POWER:\n")
## ALL FEATURES RANKED BY PREDICTIVE POWER:
cat(strrep("-", 70), "\n")
## ----------------------------------------------------------------------
print(feature_importance_summary)
##                  Feature        Type              Metric      Value Direction
## 1              PRICE_log   Numerical Pearson Correlation 0.51470000  Positive
## 2            PRICE_final   Numerical Pearson Correlation 0.48690000  Positive
## 3                   SIZE Categorical          Cramér's V 0.14714903  Variable
## 4      DEVICE_TYPE_clean Categorical          Cramér's V 0.13277789  Variable
## 5              SIZE_AREA   Numerical Pearson Correlation 0.12050000  Positive
## 6            SIZE_HEIGHT   Numerical Pearson Correlation 0.12010000  Positive
## 7  DEVICE_GEO_CITY_clean Categorical          Cramér's V 0.09273084  Variable
## 8    NUM_REQUESTED_SIZES   Numerical Pearson Correlation 0.04880000  Positive
## 9    RESPONSE_TIME_clean   Numerical Pearson Correlation 0.03780000  Positive
## 10            SIZE_WIDTH   Numerical Pearson Correlation 0.03330000  Positive
## 11  DEVICE_GEO_LAT_clean   Numerical Pearson Correlation 0.02790000  Positive
## 12     RESPONSE_TIME_log   Numerical Pearson Correlation 0.02660000  Positive
## 13 DEVICE_GEO_LONG_clean   Numerical Pearson Correlation 0.01230000  Positive
## 14                  HOUR   Numerical Pearson Correlation 0.00190000  Negative
## 15       DAY_OF_WEEK_NUM   Numerical Pearson Correlation 0.00170000  Positive
## 16                   DAY   Numerical Pearson Correlation 0.00170000  Positive
## 17           DAY_OF_WEEK Categorical          Cramér's V        NaN  Variable
##    Importance_Level
## 1              High
## 2              High
## 3            Medium
## 4            Medium
## 5            Medium
## 6            Medium
## 7               Low
## 8               Low
## 9               Low
## 10              Low
## 11              Low
## 12              Low
## 13              Low
## 14              Low
## 15              Low
## 16              Low
## 17              Low
cat(strrep("-", 70), "\n\n")
## ----------------------------------------------------------------------
# Create a visualization of feature importance
ggplot(feature_importance_summary, aes(x = reorder(Feature, Value), y = Value, fill = Importance_Level)) +
  geom_bar(stat = "identity", alpha = 0.8) +
  geom_text(aes(label = round(Value, 3)),
            hjust = -0.1, size = 3.5) +
  coord_flip() +
  scale_fill_manual(values = c("High" = "#E41A1C", "Medium" = "#377EB8", "Low" = "#4DAF4A")) +
  labs(
    title = "Feature Importance for Predicting Bid Wins",
    subtitle = "Higher values indicate stronger predictive power",
    x = "Feature",
    y = "Predictive Strength (Correlation or Cramér's V)",
    fill = "Importance Level"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1)))

  # Get device-specific win rates
  device_rates <- categorical_results$DEVICE_TYPE_clean$win_rates
  best_device <- device_rates[which.max(device_rates$win_rate), ]
  worst_device <- device_rates[which.min(device_rates$win_rate), ]

Conclusion Q1: PRICE IS KING: - Bid price has the strongest correlation with winning ( 0.487 ) - Increasing bid price is the most reliable way to increase win probability

#2. Systematic differences across devices, regions, or ad categories

# MULTIVARIATE ANALYSIS WITH ROBUST FACTOR HANDLING
cat("\n")
cat(strrep("=", 80), "\n")
## ================================================================================
cat("MULTIVARIATE ANALYSIS: SYSTEMATIC DIFFERENCES IN OUTCOMES\n")
## MULTIVARIATE ANALYSIS: SYSTEMATIC DIFFERENCES IN OUTCOMES
cat(strrep("=", 80), "\n\n")
## ================================================================================
# 1. CAREFULLY PREPARE DATA WITH FACTOR CHECKING
analysis_data <- bids_analysis %>%
  # Select relevant variables
  select(
    BID_WON_numeric,
    PRICE_final,
    RESPONSE_TIME_clean,
    HOUR,
    DEVICE_TYPE_clean,
    DEVICE_GEO_REGION_clean,
    SIZE,
    SIZE_AREA,
    NUM_REQUESTED_SIZES
  ) %>%
  # Remove any rows with missing values
  drop_na() %>%
  # Convert to proper types
  mutate(
    # Ensure character type
    DEVICE_TYPE_clean = as.character(DEVICE_TYPE_clean),
    DEVICE_GEO_REGION_clean = as.character(DEVICE_GEO_REGION_clean),
    SIZE = as.character(SIZE)
  )

# Check unique values for each potential factor
cat("Unique value counts before filtering:\n")
## Unique value counts before filtering:
cat("- Device types:", length(unique(analysis_data$DEVICE_TYPE_clean)), "\n")
## - Device types: 5
cat("- Regions:", length(unique(analysis_data$DEVICE_GEO_REGION_clean)), "\n")
## - Regions: 1
cat("- Ad sizes:", length(unique(analysis_data$SIZE)), "\n\n")
## - Ad sizes: 38
# Filter to ensure we have multiple levels for each factor
analysis_data_filtered <- analysis_data %>%
  # Group by combination of factors and count
  group_by(DEVICE_TYPE_clean, DEVICE_GEO_REGION_clean, SIZE) %>%
  mutate(group_count = n()) %>%
  ungroup() %>%
  # Keep only factor levels that appear enough times
  mutate(
    # Check if device type has enough variety
    device_ok = length(unique(DEVICE_TYPE_clean[group_count > 10])) > 1,
    # Check if region has enough variety
    region_ok = length(unique(DEVICE_GEO_REGION_clean[group_count > 10])) > 1,
    # Check if size has enough variety
    size_ok = length(unique(SIZE[group_count > 10])) > 1
  ) %>%
  # Keep only if all factors have multiple levels
  filter(device_ok & region_ok & size_ok) %>%
  # Convert to factors
  mutate(
    DEVICE_TYPE_clean = factor(DEVICE_TYPE_clean),
    DEVICE_GEO_REGION_clean = factor(DEVICE_GEO_REGION_clean),
    SIZE = factor(SIZE)
  ) %>%
  select(-device_ok, -region_ok, -size_ok, -group_count)

cat("After filtering - Unique value counts:\n")
## After filtering - Unique value counts:
cat("- Device types:", length(levels(analysis_data_filtered$DEVICE_TYPE_clean)), "\n")
## - Device types: 0
cat("- Regions:", length(levels(analysis_data_filtered$DEVICE_GEO_REGION_clean)), "\n")
## - Regions: 0
cat("- Ad sizes:", length(levels(analysis_data_filtered$SIZE)), "\n")
## - Ad sizes: 0
cat("Final sample size:", nrow(analysis_data_filtered), "\n\n")
## Final sample size: 0
# If we still don't have multiple levels, do univariate analyses instead
if (length(levels(analysis_data_filtered$DEVICE_TYPE_clean)) < 2 ||
    length(levels(analysis_data_filtered$DEVICE_GEO_REGION_clean)) < 2 ||
    length(levels(analysis_data_filtered$SIZE)) < 2) {

  cat("WARNING: Insufficient variation for full multivariate model.\n")
  cat("Performing separate univariate analyses instead.\n\n")

  # 2. SEPARATE UNIVARIATE ANALYSES
  cat("SEPARATE ANALYSES FOR EACH FACTOR\n")
  cat(strrep("-", 80), "\n\n")

  # Function for univariate analysis
  run_univariate_analysis <- function(data, factor_var, factor_name) {
    if (length(unique(data[[factor_var]])) > 1) {
      cat(paste("ANALYSIS FOR", factor_name, ":\n"))
      cat(strrep("-", 40), "\n")

      # Descriptive statistics
      desc_stats <- data %>%
        group_by(!!sym(factor_var)) %>%
        summarise(
          n = n(),
          win_rate = mean(BID_WON_numeric),
          avg_price = mean(PRICE_final),
          avg_response = mean(RESPONSE_TIME_clean),
          .groups = "drop"
        ) %>%
        arrange(desc(win_rate))

      cat("Descriptive Statistics:\n")
      print(desc_stats)
      cat("\n")

      # Statistical test (ANOVA for win rate differences)
      if (nrow(desc_stats) > 1) {
        aov_result <- aov(BID_WON_numeric ~ factor(data[[factor_var]]), data = data)
        cat("ANOVA test for win rate differences:\n")
        print(summary(aov_result))

        # Tukey HSD post-hoc test if significant
        if (summary(aov_result)[[1]]$"Pr(>F)"[1] < 0.05) {
          cat("\nTukey HSD Post-hoc Comparisons:\n")
          tukey_result <- TukeyHSD(aov_result)
          print(tukey_result)
        }
        cat("\n")
      }

      # Visualize
      p <- ggplot(desc_stats, aes(x = reorder(!!sym(factor_var), win_rate), y = win_rate)) +
        geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) +
        geom_errorbar(aes(ymin = win_rate - 1.96*sqrt(win_rate*(1-win_rate)/n),
                          ymax = win_rate + 1.96*sqrt(win_rate*(1-win_rate)/n)),
                      width = 0.2) +
        geom_text(aes(label = paste0(round(win_rate*100, 1), "%\n(n=", n, ")")),
                  vjust = -0.3, size = 3) +
        labs(
          title = paste("Win Rate by", factor_name),
          x = factor_name,
          y = "Win Rate"
        ) +
        theme_minimal() +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

      print(p)
      cat(strrep("=", 80), "\n\n")
    } else {
      cat(paste("Insufficient variation in", factor_name, "for analysis\n\n"))
    }
  }

  # Run analyses for each factor
  run_univariate_analysis(analysis_data, "DEVICE_TYPE_clean", "Device Type")
  run_univariate_analysis(analysis_data, "DEVICE_GEO_REGION_clean", "Region")
  run_univariate_analysis(analysis_data, "SIZE", "Ad Size")

} else {
  # 3. PROCEED WITH MULTIVARIATE ANALYSIS
  cat("Proceeding with multivariate analysis...\n\n")

  # Model 1: Basic multivariate model
  basic_model <- glm(
    BID_WON_numeric ~
      scale(PRICE_final) +
      scale(RESPONSE_TIME_clean) +
      HOUR +
      scale(SIZE_AREA) +
      NUM_REQUESTED_SIZES +
      DEVICE_TYPE_clean +
      DEVICE_GEO_REGION_clean +
      SIZE,
    data = analysis_data_filtered,
    family = binomial()
  )

  cat("MODEL 1: MULTIVARIATE LOGISTIC REGRESSION\n")
  cat(strrep("-", 80), "\n")
  cat("Model coefficients (systematic differences after controlling for other factors):\n\n")

  # Extract and format results
  model_results <- broom::tidy(basic_model) %>%
    mutate(
      odds_ratio = exp(estimate),
      significance = case_when(
        p.value < 0.001 ~ "***",
        p.value < 0.01 ~ "**",
        p.value < 0.05 ~ "*",
        TRUE ~ ""
      ),
      variable_type = case_when(
        grepl("DEVICE_TYPE", term) ~ "Device Type",
        grepl("DEVICE_GEO_REGION", term) ~ "Region",
        grepl("SIZE", term) ~ "Ad Size",
        grepl("PRICE", term) ~ "Price",
        grepl("RESPONSE", term) ~ "Response Time",
        grepl("HOUR", term) ~ "Hour",
        grepl("SIZE_AREA", term) ~ "Ad Size Area",
        grepl("NUM_REQUESTED", term) ~ "# Requested Sizes",
        TRUE ~ "Intercept"
      )
    ) %>%
    arrange(variable_type, term)

  # Print systematic differences
  systematic_diffs <- model_results %>%
    filter(variable_type %in% c("Device Type", "Region", "Ad Size"))

  if (nrow(systematic_diffs) > 0) {
    cat("SYSTEMATIC DIFFERENCES:\n")
    for (var_type in unique(systematic_diffs$variable_type)) {
      cat(paste("\n", var_type, "Effects (vs reference category):\n"))
      var_results <- systematic_diffs %>%
        filter(variable_type == var_type) %>%
        select(term, estimate, odds_ratio, p.value, significance)
      print(var_results, n = 20)
    }
  }

  cat("\nCONTROL VARIABLE EFFECTS:\n")
  control_vars <- model_results %>%
    filter(variable_type %in% c("Price", "Response Time", "Hour", "Ad Size Area", "# Requested Sizes"))
  print(control_vars, n = 10)

  # 4. MODEL COMPARISON TO QUANTIFY CATEGORY EFFECTS
  cat("\n\nMODEL 2: MODEL COMPARISON TO QUANTIFY CATEGORY EFFECTS\n")
  cat(strrep("-", 80), "\n")

  # Fit models without each category to see contribution
  models <- list()

  # Base model without categories
  models$base <- glm(
    BID_WON_numeric ~ scale(PRICE_final) + scale(RESPONSE_TIME_clean) + HOUR,
    data = analysis_data_filtered,
    family = binomial()
  )

  # Add each category separately
  models$with_device <- glm(
    BID_WON_numeric ~ scale(PRICE_final) + scale(RESPONSE_TIME_clean) + HOUR + DEVICE_TYPE_clean,
    data = analysis_data_filtered,
    family = binomial()
  )

  models$with_region <- glm(
    BID_WON_numeric ~ scale(PRICE_final) + scale(RESPONSE_TIME_clean) + HOUR + DEVICE_GEO_REGION_clean,
    data = analysis_data_filtered,
    family = binomial()
  )

  models$with_size <- glm(
    BID_WON_numeric ~ scale(PRICE_final) + scale(RESPONSE_TIME_clean) + HOUR + SIZE,
    data = analysis_data_filtered,
    family = binomial()
  )

  # Compare models using AIC
  model_comparison <- map_dfr(models, ~ data.frame(AIC = AIC(.x)), .id = "model") %>%
    mutate(
      delta_AIC = AIC - min(AIC),
      improvement_over_base = AIC[model == "base"] - AIC
    ) %>%
    arrange(AIC)

  cat("Model Comparison (lower AIC is better):\n")
  print(model_comparison)

  # 5. PREDICTED WIN PROBABILITIES
  cat("\n\nMODEL 3: PREDICTED WIN PROBABILITIES BY CATEGORY\n")
  cat(strrep("-", 80), "\n")

  # Create representative scenarios
  median_price <- median(analysis_data_filtered$PRICE_final)
  median_response <- median(analysis_data_filtered$RESPONSE_TIME_clean)
  median_hour <- 12

  # Get top categories for each factor
  top_device <- analysis_data_filtered %>%
    count(DEVICE_TYPE_clean) %>%
    arrange(desc(n)) %>%
    pull(DEVICE_TYPE_clean) %>%
    head(3)

  top_region <- analysis_data_filtered %>%
    count(DEVICE_GEO_REGION_clean) %>%
    arrange(desc(n)) %>%
    pull(DEVICE_GEO_REGION_clean) %>%
    head(3)

  top_size <- analysis_data_filtered %>%
    count(SIZE) %>%
    arrange(desc(n)) %>%
    pull(SIZE) %>%
    head(3)

  # Create prediction scenarios
  pred_scenarios <- expand.grid(
    PRICE_final = median_price,
    RESPONSE_TIME_clean = median_response,
    HOUR = median_hour,
    SIZE_AREA = median(analysis_data_filtered$SIZE_AREA),
    NUM_REQUESTED_SIZES = median(analysis_data_filtered$NUM_REQUESTED_SIZES),
    DEVICE_TYPE_clean = top_device,
    DEVICE_GEO_REGION_clean = top_region,
    SIZE = top_size,
    stringsAsFactors = TRUE
  )

  # Predict
  predictions <- pred_scenarios %>%
    mutate(
      predicted_win = predict(basic_model, newdata = ., type = "response"),
      scenario_id = paste(DEVICE_TYPE_clean, DEVICE_GEO_REGION_clean, SIZE, sep = " | ")
    ) %>%
    arrange(desc(predicted_win))

  cat("Top 10 Best Scenarios for Winning:\n")
  print(predictions %>%
          select(scenario_id, predicted_win) %>%
          head(10), n = 10)

  cat("\nBottom 10 Worst Scenarios for Winning:\n")
  print(predictions %>%
          select(scenario_id, predicted_win) %>%
          tail(10), n = 10)

  # 6. VISUALIZE CATEGORY EFFECTS
  cat("\n\nVISUALIZATION OF SYSTEMATIC DIFFERENCES\n")
  cat(strrep("-", 80), "\n")

  # Calculate adjusted win rates (controlling for price and response time)
  adjusted_data <- analysis_data_filtered %>%
    group_by(DEVICE_TYPE_clean, DEVICE_GEO_REGION_clean, SIZE) %>%
    summarise(
      n = n(),
      raw_win_rate = mean(BID_WON_numeric),
      # Adjusted using model residuals
      residual = mean(residuals(basic_model)),
      adjusted_win_rate = raw_win_rate - residual,
      .groups = "drop"
    ) %>%
    filter(n > 10)

  # Plot device differences
  if (nrow(adjusted_data) > 0) {
    p1 <- adjusted_data %>%
      group_by(DEVICE_TYPE_clean) %>%
      summarise(
        adj_rate = weighted.mean(adjusted_win_rate, n),
        n_total = sum(n)
      ) %>%
      filter(n_total > 50) %>%
      ggplot(aes(x = reorder(DEVICE_TYPE_clean, adj_rate), y = adj_rate)) +
      geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) +
      geom_text(aes(label = round(adj_rate, 3)),
                vjust = -0.5, size = 3) +
      labs(
        title = "Adjusted Win Rate by Device Type",
        subtitle = "Controlling for price, response time, and other factors",
        x = "Device Type",
        y = "Adjusted Win Rate"
      ) +
      theme_minimal() +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))

    print(p1)
  }
}
## WARNING: Insufficient variation for full multivariate model.
## Performing separate univariate analyses instead.
## 
## SEPARATE ANALYSES FOR EACH FACTOR
## -------------------------------------------------------------------------------- 
## 
## ANALYSIS FOR Device Type :
## ---------------------------------------- 
## Descriptive Statistics:
## # A tibble: 5 × 5
##   DEVICE_TYPE_clean      n win_rate avg_price avg_response
##   <chr>              <int>    <dbl>     <dbl>        <dbl>
## 1 2                  24583    0.474     0.696         246.
## 2 0                   3694    0.370     0.589         222.
## 3 1                 228990    0.289     0.493         210.
## 4 5                    318    0.286     1.01          256.
## 5 4                 183398    0.223     0.340         184.
## 
## ANOVA test for win rate differences:
##                                Df Sum Sq Mean Sq F value Pr(>F)    
## factor(data[[factor_var]])      4   1543   385.7    1978 <2e-16 ***
## Residuals                  440978  85958     0.2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Tukey HSD Post-hoc Comparisons:
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BID_WON_numeric ~ factor(data[[factor_var]]), data = data)
## 
## $`factor(data[[factor_var]])`
##             diff          lwr         upr     p adj
## 1-0 -0.080300223 -0.100274407 -0.06032604 0.0000000
## 2-0  0.104075206  0.082823525  0.12532689 0.0000000
## 4-0 -0.146536685 -0.166550246 -0.12652312 0.0000000
## 5-0 -0.083625325 -0.154007220 -0.01324343 0.0104672
## 2-1  0.184375428  0.176292508  0.19245835 0.0000000
## 4-1 -0.066236462 -0.070010358 -0.06246257 0.0000000
## 5-1 -0.003325102 -0.070906984  0.06425678 0.9999271
## 4-2 -0.250611890 -0.258791632 -0.24243215 0.0000000
## 5-2 -0.187700530 -0.255670941 -0.11973012 0.0000000
## 5-4  0.062911360 -0.004682171  0.13050489 0.0822208

## ================================================================================ 
## 
## Insufficient variation in Region for analysis
## 
## ANALYSIS FOR Ad Size :
## ---------------------------------------- 
## Descriptive Statistics:
## # A tibble: 38 × 5
##    SIZE          n win_rate avg_price avg_response
##    <chr>     <int>    <dbl>     <dbl>        <dbl>
##  1 1080x1080     2    1         2.64          215 
##  2 1080x566      1    1         1.16          208 
##  3 1140x635      3    1         1.31          565 
##  4 320x106       2    1         1.43          246.
##  5 750x570       1    1         6.26          309 
##  6 320x107      14    0.857     0.789         122 
##  7 640x480      91    0.791     0.564         187.
##  8 1140x250      3    0.667     1.08          271 
##  9 640x360       6    0.667     0.564         146.
## 10 480x360       5    0.6       0.631         193.
## # ℹ 28 more rows
## 
## ANOVA test for win rate differences:
##                                Df Sum Sq Mean Sq F value Pr(>F)    
## factor(data[[factor_var]])     37   1895   51.21   263.8 <2e-16 ***
## Residuals                  440945  85606    0.19                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Tukey HSD Post-hoc Comparisons:
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BID_WON_numeric ~ factor(data[[factor_var]]), data = data)
## 
## $`factor(data[[factor_var]])`
##                              diff           lwr          upr     p adj
## 1080x1080-0x0        6.588889e-01 -0.5460674653  1.863845243 0.9854964
## 1080x566-0x0         6.588889e-01 -1.0442318633  2.362009641 0.9999868
## 1140x250-0x0         3.255556e-01 -0.6588324041  1.309943515 0.9999998
## 1140x600-0x0        -3.411111e-01 -2.0442318633  1.362009641 1.0000000
## 1140x635-0x0         6.588889e-01 -0.3254990708  1.643276849 0.8231857
## 1200x250-0x0        -7.777778e-03 -0.9921657374  0.976610182 1.0000000
## 1237x500-0x0        -9.419753e-02 -0.2394714980  0.051076436 0.8702139
## 1280x1280-0x0       -3.411111e-01 -2.0442318633  1.362009641 1.0000000
## 160x600-0x0          2.401163e-01  0.1231579328  0.357074719 0.0000000
## 180x150-0x0         -9.111111e-02 -0.9440880011  0.761865779 1.0000000
## 1x1-0x0              2.427350e-02 -0.2184991953  0.267046204 1.0000000
## 250x250-0x0          1.588889e-01 -0.6940880011  1.011865779 1.0000000
## 300x100-0x0         -1.744444e-01 -0.6690851368  0.320196248 0.9999987
## 300x1050-0x0        -2.146743e-01 -0.3343425689 -0.095006090 0.0000000
## 300x250-0x0          1.014517e-02 -0.0467880776  0.067078418 1.0000000
## 300x251-0x0         -2.752804e-01 -0.3861952044 -0.164365576 0.0000000
## 300x50-0x0          -6.186382e-02 -0.1197250739 -0.004002563 0.0186098
## 300x600-0x0          5.404241e-02 -0.0061095725  0.114194386 0.1708765
## 320x100-0x0         -2.577778e-01 -0.3897712636 -0.125784292 0.0000000
## 320x106-0x0          6.588889e-01 -0.5460674653  1.863845243 0.9854964
## 320x107-0x0          5.160317e-01  0.0575815895  0.974481903 0.0076697
## 320x480-0x0         -7.444444e-02 -0.5175915967  0.368702708 1.0000000
## 320x50-0x0          -1.163429e-01 -0.1731776339 -0.059508150 0.0000000
## 325x508-0x0         -1.808547e-01 -0.2926843747 -0.069025027 0.0000003
## 336x280-0x0          1.935401e-02 -0.1008523619  0.139560372 1.0000000
## 468x60-0x0          -1.154474e-01 -0.2420953025  0.011200514 0.1481709
## 480x360-0x0          2.588889e-01 -0.5044586904  1.022236468 0.9999996
## 600x300-0x0         -7.777778e-03 -0.9921657374  0.976610182 1.0000000
## 620x366-0x0         -1.411111e-01 -0.2871525108  0.004930289 0.0776015
## 640x360-0x0          3.255556e-01 -0.3716671454  1.022778257 0.9991233
## 640x480-0x0          4.500977e-01  0.2628574456  0.637337915 0.0000000
## 728x90-0x0           5.946401e-02 -0.0008346331  0.119762662 0.0600874
## 750x570-0x0          6.588889e-01 -1.0442318633  2.362009641 0.9999868
## 970x250-0x0          1.729451e-01  0.1067412676  0.239148960 0.0000000
## 970x500-0x0         -3.411111e-01 -0.9113328001  0.229110578 0.9485090
## 970x66-0x0          -3.411111e-01 -2.0442318633  1.362009641 1.0000000
## 970x90-0x0          -6.169935e-02 -0.2182999634  0.094901271 0.9999795
## 1080x566-1080x1080   9.319212e-13 -2.0847305445  2.084730545 1.0000000
## 1140x250-1080x1080  -3.333333e-01 -1.8871997374  1.220533071 1.0000000
## 1140x600-1080x1080  -1.000000e+00 -3.0847305445  1.084730545 0.9985134
## 1140x635-1080x1080   2.704503e-13 -1.5538664041  1.553866404 1.0000000
## 1200x250-1080x1080  -6.666667e-01 -2.2205330708  0.887199737 0.9998526
## 1237x500-1080x1080  -7.530864e-01 -1.9641131214  0.457940282 0.9176190
## 1280x1280-1080x1080 -1.000000e+00 -3.0847305445  1.084730545 0.9985134
## 160x600-1080x1080   -4.187726e-01 -1.6267296867  0.789184560 0.9999992
## 180x150-1080x1080   -7.500000e-01 -2.2241271050  0.724127105 0.9955999
## 1x1-1080x1080       -6.346154e-01 -1.8611632762  0.591932507 0.9941098
## 250x250-1080x1080   -5.000000e-01 -1.9741271050  0.974127105 0.9999996
## 300x100-1080x1080   -8.333333e-01 -2.1333912402  0.466724574 0.8848482
## 300x1050-1080x1080  -8.735632e-01 -2.0817857286  0.334659292 0.6698159
## 300x250-1080x1080   -6.487437e-01 -1.8523726238  0.554885187 0.9884321
## 300x251-1080x1080   -9.341693e-01 -2.1415562272  0.273217669 0.5031424
## 300x50-1080x1080    -7.207527e-01 -1.9244258654  0.482920451 0.9478638
## 300x600-1080x1080   -6.048465e-01 -1.8086319312  0.598938967 0.9964767
## 320x100-1080x1080   -9.166667e-01 -2.1261721139  0.292838781 0.5555522
## 320x106-1080x1080    4.665157e-13 -1.7021753617  1.702175362 1.0000000
## 320x107-1080x1080   -1.428571e-01 -1.4295807700  1.143866484 1.0000000
## 320x480-1080x1080   -7.333333e-01 -2.0146843958  0.548017729 0.9715508
## 320x50-1080x1080    -7.752318e-01 -1.9788560304  0.428392469 0.8788936
## 325x508-1080x1080   -8.397436e-01 -2.0472149238  0.367727744 0.7535795
## 336x280-1080x1080   -6.395349e-01 -1.8478108114  0.568741044 0.9913750
## 468x60-1080x1080    -7.743363e-01 -1.9832700445  0.434597478 0.8857681
## 480x360-1080x1080   -4.000000e-01 -1.8241420833  1.024142083 1.0000000
## 600x300-1080x1080   -6.666667e-01 -2.2205330708  0.887199737 0.9998526
## 620x366-1080x1080   -8.000000e-01 -2.0111190020  0.411119002 0.8439580
## 640x360-1080x1080   -3.333333e-01 -1.7231536963  1.056487030 1.0000000
## 640x480-1080x1080   -2.087912e-01 -1.4255656546  1.007983237 1.0000000
## 728x90-1080x1080    -5.994249e-01 -1.8032176610  0.604367913 0.9970063
## 750x570-1080x1080    5.966339e-13 -2.0847305445  2.084730545 1.0000000
## 970x250-1080x1080   -4.859438e-01 -1.6900468006  0.718159250 0.9999636
## 970x500-1080x1080   -1.000000e+00 -2.3306516904  0.330651690 0.5767240
## 970x66-1080x1080    -1.000000e+00 -3.0847305445  1.084730545 0.9985134
## 970x90-1080x1080    -7.205882e-01 -1.9330258213  0.491849351 0.9526289
## 1140x250-1080x566   -3.333333e-01 -2.2988361400  1.632169473 1.0000000
## 1140x600-1080x566   -1.000000e+00 -3.4072394821  1.407239482 0.9999287
## 1140x635-1080x566   -6.614709e-13 -1.9655028066  1.965502807 1.0000000
## 1200x250-1080x566   -6.666667e-01 -2.6321694733  1.298836140 0.9999996
## 1237x500-1080x566   -7.530864e-01 -2.4605073266  0.954334487 0.9997307
## 1280x1280-1080x566  -1.000000e+00 -3.4072394821  1.407239482 0.9999287
## 160x600-1080x566    -4.187726e-01 -2.1240176756  1.286472549 1.0000000
## 180x150-1080x566    -7.500000e-01 -2.6530899092  1.153089909 0.9999793
## 1x1-1080x566        -6.346154e-01 -2.3530798769  1.083849108 0.9999959
## 250x250-1080x566    -5.000000e-01 -2.4030899092  1.403089909 1.0000000
## 300x100-1080x566    -8.333333e-01 -2.6050136212  0.938346955 0.9989860
## 300x1050-1080x566   -8.735632e-01 -2.5789963350  0.831869898 0.9950479
## 300x250-1080x566    -6.487437e-01 -2.3509255604  1.053438123 0.9999909
## 300x251-1080x566    -9.341693e-01 -2.6390105396  0.770671982 0.9850243
## 300x50-1080x566     -7.207527e-01 -2.4229658410  0.981460426 0.9998901
## 300x600-1080x566    -6.048465e-01 -2.3071390212  1.097446057 0.9999985
## 320x100-1080x566    -9.166667e-01 -2.6230089266  0.789675593 0.9890262
## 320x106-1080x566    -4.654055e-13 -2.0847305445  2.084730545 1.0000000
## 320x107-1080x566    -1.428571e-01 -1.9047760325  1.619061747 1.0000000
## 320x480-1080x566    -7.333333e-01 -2.4913324876  1.024665821 0.9999216
## 320x50-1080x566     -7.752318e-01 -2.4774103305  0.926946769 0.9994715
## 325x508-1080x566    -8.397436e-01 -2.5446446143  0.865157435 0.9975495
## 336x280-1080x566    -6.395349e-01 -2.3450058446  1.065936077 0.9999939
## 468x60-1080x566     -7.743363e-01 -2.4802733630  0.931600797 0.9995070
## 480x360-1080x566    -4.000000e-01 -2.2646396849  1.464639685 1.0000000
## 600x300-1080x566    -6.666667e-01 -2.6321694733  1.298836140 0.9999996
## 620x366-1080x566    -8.000000e-01 -2.5074863742  0.907486374 0.9990618
## 640x360-1080x566    -3.333333e-01 -2.1718928571  1.505226190 1.0000000
## 640x480-1080x566    -2.087912e-01 -1.9202936286  1.502711211 1.0000000
## 728x90-1080x566     -5.994249e-01 -2.3017226022  1.102872854 0.9999988
## 750x570-1080x566    -3.352874e-13 -2.4072394821  2.407239482 1.0000000
## 970x250-1080x566    -4.859438e-01 -2.1884609048  1.216573355 1.0000000
## 970x500-1080x566    -1.000000e+00 -2.7942503734  0.794250373 0.9806289
## 970x66-1080x566     -1.000000e+00 -3.4072394821  1.407239482 0.9999287
## 970x90-1080x566     -7.205882e-01 -2.4290101331  0.987833663 0.9998993
## 1140x600-1140x250   -6.666667e-01 -2.6321694733  1.298836140 0.9999996
## 1140x635-1140x250    3.333333e-01 -1.0564870297  1.723153696 1.0000000
## 1200x250-1140x250   -3.333333e-01 -1.7231536963  1.056487030 1.0000000
## 1237x500-1140x250   -4.197531e-01 -1.4115622978  0.572056125 0.9998913
## 1280x1280-1140x250  -6.666667e-01 -2.6321694733  1.298836140 0.9999996
## 160x600-1140x250    -8.543923e-02 -1.0734980601  0.902619600 1.0000000
## 180x150-1140x250    -4.166667e-01 -1.7167245735  0.883391240 0.9999999
## 1x1-1140x250        -3.012821e-01 -1.3119845644  0.709420462 1.0000000
## 250x250-1140x250    -1.666667e-01 -1.4667245735  1.133391240 1.0000000
## 300x100-1140x250    -5.000000e-01 -1.5987494714  0.598749471 0.9994803
## 300x1050-1140x250   -5.402299e-01 -1.5286131477  0.448153378 0.9855931
## 300x250-1140x250    -3.154104e-01 -1.2981730123  0.667352242 0.9999999
## 300x251-1140x250    -6.008359e-01 -1.5881976232  0.386525732 0.9362829
## 300x50-1140x250     -3.874194e-01 -1.3702361988  0.595397451 0.9999792
## 300x600-1140x250    -2.715131e-01 -1.2544674951  0.711441197 1.0000000
## 320x100-1140x250    -5.833333e-01 -1.5732844769  0.406617810 0.9576473
## 320x106-1140x250     3.333333e-01 -1.2205330708  1.887199737 1.0000000
## 320x107-1140x250     1.904762e-01 -0.8924631117  1.273415493 1.0000000
## 320x480-1140x250    -4.000000e-01 -1.4765502240  0.676550224 0.9999952
## 320x50-1140x250     -4.418984e-01 -1.4246553726  0.540858478 0.9995948
## 325x508-1140x250    -5.064103e-01 -1.4938751227  0.481054610 0.9949435
## 336x280-1140x250    -3.062016e-01 -1.2946501110  0.682247010 1.0000000
## 468x60-1140x250     -4.410029e-01 -1.4302555359  0.548249636 0.9996621
## 480x360-1140x250    -6.666667e-02 -1.3097597899  1.176426457 1.0000000
## 600x300-1140x250    -3.333333e-01 -1.7231536963  1.056487030 1.0000000
## 620x366-1140x250    -4.666667e-01 -1.4585885773  0.525255244 0.9989816
## 640x360-1140x250    -1.001088e-12 -1.2036197411  1.203619741 1.0000000
## 640x480-1140x250     1.245421e-01 -0.8742771242  1.123361373 1.0000000
## 728x90-1140x250     -2.660915e-01 -1.2490548735  0.716871792 1.0000000
## 750x570-1140x250     3.333333e-01 -1.6321694733  2.298836140 1.0000000
## 970x250-1140x250    -1.526104e-01 -1.1359536858  0.830732802 1.0000000
## 970x500-1140x250    -6.666667e-01 -1.8014502412  0.468116908 0.9593017
## 970x66-1140x250     -6.666667e-01 -2.6321694733  1.298836140 0.9999996
## 970x90-1140x250     -3.872549e-01 -1.3807863507  0.606276547 0.9999842
## 1140x635-1140x600    1.000000e+00 -0.9655028066  2.965502807 0.9955999
## 1200x250-1140x600    3.333333e-01 -1.6321694733  2.298836140 1.0000000
## 1237x500-1140x600    2.469136e-01 -1.4605073266  1.954334487 1.0000000
## 1280x1280-1140x600  -1.793121e-12 -2.4072394821  2.407239482 1.0000000
## 160x600-1140x600     5.812274e-01 -1.1240176756  2.286472549 0.9999995
## 180x150-1140x600     2.500000e-01 -1.6530899092  2.153089909 1.0000000
## 1x1-1140x600         3.653846e-01 -1.3530798769  2.083849108 1.0000000
## 250x250-1140x600     5.000000e-01 -1.4030899092  2.403089909 1.0000000
## 300x100-1140x600     1.666667e-01 -1.6050136212  1.938346954 1.0000000
## 300x1050-1140x600    1.264368e-01 -1.5789963350  1.831869898 1.0000000
## 300x250-1140x600     3.512563e-01 -1.3509255604  2.053438123 1.0000000
## 300x251-1140x600     6.583072e-02 -1.6390105396  1.770671982 1.0000000
## 300x50-1140x600      2.792473e-01 -1.4229658410  1.981460426 1.0000000
## 300x600-1140x600     3.951535e-01 -1.3071390212  2.097446057 1.0000000
## 320x100-1140x600     8.333333e-02 -1.6230089266  1.789675593 1.0000000
## 320x106-1140x600     1.000000e+00 -1.0847305445  3.084730545 0.9985134
## 320x107-1140x600     8.571429e-01 -0.9047760325  2.619061747 0.9980559
## 320x480-1140x600     2.666667e-01 -1.4913324876  2.024665821 1.0000000
## 320x50-1140x600      2.247682e-01 -1.4774103305  1.926946769 1.0000000
## 325x508-1140x600     1.602564e-01 -1.5446446143  1.865157435 1.0000000
## 336x280-1140x600     3.604651e-01 -1.3450058446  2.065936077 1.0000000
## 468x60-1140x600      2.256637e-01 -1.4802733630  1.931600797 1.0000000
## 480x360-1140x600     6.000000e-01 -1.2646396849  2.464639685 0.9999999
## 600x300-1140x600     3.333333e-01 -1.6321694733  2.298836140 1.0000000
## 620x366-1140x600     2.000000e-01 -1.5074863742  1.907486374 1.0000000
## 640x360-1140x600     6.666667e-01 -1.1718928571  2.505226190 0.9999974
## 640x480-1140x600     7.912088e-01 -0.9202936286  2.502711211 0.9992829
## 728x90-1140x600      4.005751e-01 -1.3017226022  2.102872854 1.0000000
## 750x570-1140x600     1.000000e+00 -1.4072394821  3.407239482 0.9999287
## 970x250-1140x600     5.140562e-01 -1.1884609048  2.216573355 1.0000000
## 970x500-1140x600    -1.912692e-12 -1.7942503734  1.794250373 1.0000000
## 970x66-1140x600     -1.686873e-12 -2.4072394821  2.407239482 1.0000000
## 970x90-1140x600      2.794118e-01 -1.4290101331  1.987833663 1.0000000
## 1200x250-1140x635   -6.666667e-01 -2.0564870297  0.723153696 0.9985134
## 1237x500-1140x635   -7.530864e-01 -1.7448956312  0.238722792 0.5508337
## 1280x1280-1140x635  -1.000000e+00 -2.9655028066  0.965502807 0.9955999
## 160x600-1140x635    -4.187726e-01 -1.4068313934  0.569286267 0.9998877
## 180x150-1140x635    -7.500000e-01 -2.0500579069  0.550057907 0.9681971
## 1x1-1140x635        -6.346154e-01 -1.6453178977  0.376087128 0.9080547
## 250x250-1140x635    -5.000000e-01 -1.8000579069  0.800057907 0.9999886
## 300x100-1140x635    -8.333333e-01 -1.9320828047  0.265416138 0.5537166
## 300x1050-1140x635   -8.735632e-01 -1.8619464810  0.114820044 0.1990170
## 300x250-1140x635    -6.487437e-01 -1.6315063456  0.334018908 0.8449224
## 300x251-1140x635    -9.341693e-01 -1.9215309566  0.053192399 0.0994025
## 300x50-1140x635     -7.207527e-01 -1.7035695321  0.262064118 0.6365221
## 300x600-1140x635    -6.048465e-01 -1.5878008285  0.378107864 0.9271664
## 320x100-1140x635    -9.166667e-01 -1.9066178102  0.073284477 0.1259753
## 320x106-1140x635     1.960654e-13 -1.5538664041  1.553866404 1.0000000
## 320x107-1140x635    -1.428571e-01 -1.2257964451  0.940082159 1.0000000
## 320x480-1140x635    -7.333333e-01 -1.8098835574  0.343216891 0.7929938
## 320x50-1140x635     -7.752318e-01 -1.7579887059  0.207525145 0.4539237
## 325x508-1140x635    -8.397436e-01 -1.8272084561  0.147721277 0.2751788
## 336x280-1140x635    -6.395349e-01 -1.6279834443  0.348913677 0.8730680
## 468x60-1140x635     -7.743363e-01 -1.7635888692  0.214916303 0.4735792
## 480x360-1140x635    -4.000000e-01 -1.6430931233  0.843093123 0.9999999
## 600x300-1140x635    -6.666667e-01 -2.0564870297  0.723153696 0.9985134
## 620x366-1140x635    -8.000000e-01 -1.7919219106  0.191921911 0.3983862
## 640x360-1140x635    -3.333333e-01 -1.5369530744  0.870286408 1.0000000
## 640x480-1140x635    -2.087912e-01 -1.2076104576  0.790028040 1.0000000
## 728x90-1140x635     -5.994249e-01 -1.5823882068  0.383538458 0.9346245
## 750x570-1140x635     3.261835e-13 -1.9655028066  1.965502807 1.0000000
## 970x250-1140x635    -4.859438e-01 -1.4692870191  0.497399469 0.9973955
## 970x500-1140x635    -1.000000e+00 -2.1347835745  0.134783574 0.2043646
## 970x66-1140x635     -1.000000e+00 -2.9655028066  0.965502807 0.9955999
## 970x90-1140x635     -7.205882e-01 -1.7141196840  0.272943213 0.6625897
## 1237x500-1200x250   -8.641975e-02 -1.0782289645  0.905389458 1.0000000
## 1280x1280-1200x250  -3.333333e-01 -2.2988361400  1.632169473 1.0000000
## 160x600-1200x250     2.478941e-01 -0.7401647267  1.235952934 1.0000000
## 180x150-1200x250    -8.333333e-02 -1.3833912402  1.216724574 1.0000000
## 1x1-1200x250         3.205128e-02 -0.9786512310  1.042753795 1.0000000
## 250x250-1200x250     1.666667e-01 -1.1333912402  1.466724574 1.0000000
## 300x100-1200x250    -1.666667e-01 -1.2654161381  0.932082805 1.0000000
## 300x1050-1200x250   -2.068966e-01 -1.1952798143  0.781486711 1.0000000
## 300x250-1200x250     1.792295e-02 -0.9648396789  1.000685575 1.0000000
## 300x251-1200x250    -2.675026e-01 -1.2548642899  0.719859065 1.0000000
## 300x50-1200x250     -5.408604e-02 -1.0369028654  0.928730784 1.0000000
## 300x600-1200x250     6.182018e-02 -0.9211341618  1.044774531 1.0000000
## 320x100-1200x250    -2.500000e-01 -1.2399511435  0.739951144 1.0000000
## 320x106-1200x250     6.666667e-01 -0.8871997374  2.220533071 0.9998526
## 320x107-1200x250     5.238095e-01 -0.5591297784  1.606748826 0.9982565
## 320x480-1200x250    -6.666667e-02 -1.1432168907  1.009883557 1.0000000
## 320x50-1200x250     -1.085651e-01 -1.0913220393  0.874191811 1.0000000
## 325x508-1200x250    -1.730769e-01 -1.1605417894  0.814387943 1.0000000
## 336x280-1200x250     2.713178e-02 -0.9613167777  1.015580344 1.0000000
## 468x60-1200x250     -1.076696e-01 -1.0969222025  0.881582970 1.0000000
## 480x360-1200x250     2.666667e-01 -0.9764264566  1.509759790 1.0000000
## 600x300-1200x250     1.774136e-13 -1.3898203630  1.389820363 1.0000000
## 620x366-1200x250    -1.333333e-01 -1.1252552439  0.858588577 1.0000000
## 640x360-1200x250     3.333333e-01 -0.8702864077  1.536953074 1.0000000
## 640x480-1200x250     4.578755e-01 -0.5409437909  1.456694707 0.9993957
## 728x90-1200x250      6.724179e-02 -0.9157215401  1.050205125 1.0000000
## 750x570-1200x250     6.666667e-01 -1.2988361400  2.632169473 0.9999996
## 970x250-1200x250     1.807229e-01 -0.8026203524  1.164066136 1.0000000
## 970x500-1200x250    -3.333333e-01 -1.4681169078  0.801450241 1.0000000
## 970x66-1200x250     -3.333333e-01 -2.2988361400  1.632169473 1.0000000
## 970x90-1200x250     -5.392157e-02 -1.0474530173  0.939609880 1.0000000
## 1280x1280-1237x500  -2.469136e-01 -1.9543344871  1.460507327 1.0000000
## 160x600-1237x500     3.343139e-01  0.1659537592  0.502673954 0.0000000
## 180x150-1237x500     3.086420e-03 -0.8584444454  0.864617285 1.0000000
## 1x1-1237x500         1.184710e-01 -0.1528303478  0.389772418 0.9997830
## 250x250-1237x500     2.530864e-01 -0.6084444454  1.114617285 1.0000000
## 300x100-1237x500    -8.024691e-02 -0.5894966277  0.429002801 1.0000000
## 300x1050-1237x500   -1.204768e-01 -0.2907305629  0.049776966 0.7176504
## 300x250-1237x500     1.043427e-01 -0.0294752776  0.238160680 0.4833911
## 300x251-1237x500    -1.810829e-01 -0.3453020809 -0.016863638 0.0110835
## 300x50-1237x500      3.233371e-02 -0.1018817174  0.166549142 1.0000000
## 300x600-1237x500     1.482399e-01  0.0130211618  0.283458713 0.0122583
## 320x100-1237x500    -1.635802e-01 -0.3427118662  0.015551372 0.1455313
## 320x106-1237x500     7.530864e-01 -0.4579402819  1.964113121 0.9176190
## 320x107-1237x500     6.102293e-01  0.1360537791  1.084404775 0.0004282
## 320x480-1237x500     1.975309e-02 -0.4396435344  0.479149707 1.0000000
## 320x50-1237x500     -2.214536e-02 -0.1559214598  0.111630738 1.0000000
## 325x508-1237x500    -8.665717e-02 -0.2514956748  0.078181335 0.9922924
## 336x280-1237x500     1.135515e-01 -0.0570808976  0.284183970 0.8324911
## 468x60-1237x500     -2.124986e-02 -0.1964798533  0.153980126 1.0000000
## 480x360-1237x500     3.530864e-01 -0.4198077409  1.125980580 0.9994370
## 600x300-1237x500     8.641975e-02 -0.9053894583  1.078228964 1.0000000
## 620x366-1237x500    -4.691358e-02 -0.2366342885  0.142807128 1.0000000
## 640x360-1237x500     4.197531e-01 -0.2879088042  1.127414977 0.9538282
## 640x480-1237x500     5.442952e-01  0.3213047131  0.767285709 0.0000000
## 728x90-1237x500      1.536615e-01  0.0183774606  0.288945631 0.0064827
## 750x570-1237x500     7.530864e-01 -0.9543344871  2.460507327 0.9997307
## 970x250-1237x500     2.671426e-01  0.1291252328  0.405160056 0.0000000
## 970x500-1237x500    -2.469136e-01 -0.8298532267  0.336026066 0.9998892
## 970x66-1237x500     -2.469136e-01 -1.9543344871  1.460507327 1.0000000
## 970x90-1237x500      3.249818e-02 -0.1654654737  0.230461843 1.0000000
## 160x600-1280x1280    5.812274e-01 -1.1240176756  2.286472549 0.9999995
## 180x150-1280x1280    2.500000e-01 -1.6530899092  2.153089909 1.0000000
## 1x1-1280x1280        3.653846e-01 -1.3530798769  2.083849108 1.0000000
## 250x250-1280x1280    5.000000e-01 -1.4030899092  2.403089909 1.0000000
## 300x100-1280x1280    1.666667e-01 -1.6050136212  1.938346955 1.0000000
## 300x1050-1280x1280   1.264368e-01 -1.5789963350  1.831869898 1.0000000
## 300x250-1280x1280    3.512563e-01 -1.3509255604  2.053438123 1.0000000
## 300x251-1280x1280    6.583072e-02 -1.6390105396  1.770671982 1.0000000
## 300x50-1280x1280     2.792473e-01 -1.4229658410  1.981460426 1.0000000
## 300x600-1280x1280    3.951535e-01 -1.3071390212  2.097446057 1.0000000
## 320x100-1280x1280    8.333333e-02 -1.6230089266  1.789675593 1.0000000
## 320x106-1280x1280    1.000000e+00 -1.0847305445  3.084730545 0.9985134
## 320x107-1280x1280    8.571429e-01 -0.9047760325  2.619061747 0.9980559
## 320x480-1280x1280    2.666667e-01 -1.4913324876  2.024665821 1.0000000
## 320x50-1280x1280     2.247682e-01 -1.4774103305  1.926946769 1.0000000
## 325x508-1280x1280    1.602564e-01 -1.5446446143  1.865157435 1.0000000
## 336x280-1280x1280    3.604651e-01 -1.3450058446  2.065936077 1.0000000
## 468x60-1280x1280     2.256637e-01 -1.4802733630  1.931600797 1.0000000
## 480x360-1280x1280    6.000000e-01 -1.2646396849  2.464639685 0.9999999
## 600x300-1280x1280    3.333333e-01 -1.6321694733  2.298836140 1.0000000
## 620x366-1280x1280    2.000000e-01 -1.5074863742  1.907486374 1.0000000
## 640x360-1280x1280    6.666667e-01 -1.1718928571  2.505226190 0.9999974
## 640x480-1280x1280    7.912088e-01 -0.9202936286  2.502711211 0.9992829
## 728x90-1280x1280     4.005751e-01 -1.3017226022  2.102872854 1.0000000
## 750x570-1280x1280    1.000000e+00 -1.4072394821  3.407239482 0.9999287
## 970x250-1280x1280    5.140562e-01 -1.1884609048  2.216573355 1.0000000
## 970x500-1280x1280   -1.195710e-13 -1.7942503734  1.794250373 1.0000000
## 970x66-1280x1280     1.062483e-13 -2.4072394821  2.407239482 1.0000000
## 970x90-1280x1280     2.794118e-01 -1.4290101331  1.987833663 1.0000000
## 180x150-160x600     -3.312274e-01 -1.1884381296  0.525983256 0.9999872
## 1x1-160x600         -2.158428e-01 -0.4730959663  0.041410323 0.3047494
## 250x250-160x600     -8.122744e-02 -0.9384381296  0.775983256 1.0000000
## 300x100-160x600     -4.145608e-01 -0.9164671473  0.087345607 0.3408259
## 300x1050-160x600    -4.547907e-01 -0.6016275865 -0.307953724 0.0000000
## 300x250-160x600     -2.299712e-01 -0.3323527533 -0.127589558 0.0000000
## 300x251-160x600     -5.153967e-01 -0.6551918978 -0.375601534 0.0000000
## 300x50-160x600      -3.019801e-01 -0.4048806871 -0.199079601 0.0000000
## 300x600-160x600     -1.860739e-01 -0.2902797608 -0.081868077 0.0000000
## 320x100-160x600     -4.978941e-01 -0.6549382921 -0.340849915 0.0000000
## 320x106-160x600      4.187726e-01 -0.7891845604  1.626729687 0.9999992
## 320x107-160x600      2.759154e-01 -0.1903646917  0.742195532 0.9552445
## 320x480-160x600     -3.145608e-01 -0.7658034935  0.136681953 0.7488471
## 320x50-160x600      -3.564592e-01 -0.4587860702 -0.254132365 0.0000000
## 325x508-160x600     -4.209710e-01 -0.5614931701 -0.280448883 0.0000000
## 336x280-160x600     -2.207623e-01 -0.3680381419 -0.073486499 0.0000048
## 468x60-160x600      -3.555637e-01 -0.5081425177 -0.202984922 0.0000000
## 480x360-160x600      1.877256e-02 -0.7493030345  0.786848161 1.0000000
## 600x300-160x600     -2.478941e-01 -1.2359529337  0.740164727 1.0000000
## 620x366-160x600     -3.812274e-01 -0.5502501787 -0.212204695 0.0000000
## 640x360-160x600      8.543923e-02 -0.6169567351  0.787835195 1.0000000
## 640x480-160x600      2.099814e-01  0.0043129264  0.415649782 0.0375052
## 728x90-160x600      -1.806523e-01 -0.2849428848 -0.076361737 0.0000000
## 750x570-160x600      4.187726e-01 -1.2864725492  2.124017676 1.0000000
## 970x250-160x600     -6.717121e-02 -0.1749837627  0.040641339 0.9158063
## 970x500-160x600     -5.812274e-01 -1.1577631005 -0.004691773 0.0447783
## 970x66-160x600      -5.812274e-01 -2.2864725492  1.124017676 0.9999995
## 970x90-160x600      -3.018157e-01 -0.4800412318 -0.123590112 0.0000000
## 1x1-180x150          1.153846e-01 -0.7678308112  0.998600042 1.0000000
## 250x250-180x150      2.500000e-01 -0.9536197411  1.453619741 1.0000000
## 300x100-180x150     -8.333333e-02 -1.0660847366  0.899418070 1.0000000
## 300x1050-180x150    -1.235632e-01 -0.9811478461  0.734021409 1.0000000
## 300x250-180x150      1.012563e-01 -0.7498443594  0.952356922 1.0000000
## 300x251-180x150     -1.841693e-01 -1.0405763097  0.672237752 1.0000000
## 300x50-180x150       2.924729e-02 -0.8219159296  0.880410515 1.0000000
## 300x600-180x150      1.451535e-01 -0.7061684934  0.996475529 1.0000000
## 320x100-180x150     -1.666667e-01 -1.0260578386  0.692724505 1.0000000
## 320x106-180x150      7.500000e-01 -0.7241271050  2.224127105 0.9955999
## 320x107-180x150      6.071429e-01 -0.3578998632  1.572185577 0.9060021
## 320x480-180x150      1.666667e-02 -0.9412009153  0.974534249 1.0000000
## 320x50-180x150      -2.523178e-02 -0.8763258377  0.825862276 1.0000000
## 325x508-180x150     -8.974359e-02 -0.9462695859  0.766782406 1.0000000
## 336x280-180x150      1.104651e-01 -0.7471947678  0.968125000 1.0000000
## 468x60-180x150      -2.433628e-02 -0.8829226788  0.834250112 1.0000000
## 480x360-180x150      3.500000e-01 -0.7918539455  1.491853946 1.0000000
## 600x300-180x150      8.333333e-02 -1.2167245735  1.383391240 1.0000000
## 620x366-180x150     -5.000000e-02 -0.9116606040  0.811660604 1.0000000
## 640x360-180x150      4.166667e-01 -0.6820828047  1.515416138 0.9999919
## 640x480-180x150      5.412088e-01 -0.3283829587  1.410800541 0.9168332
## 728x90-180x150       1.505751e-01 -0.7007572611  1.001907513 1.0000000
## 750x570-180x150      7.500000e-01 -1.1530899092  2.653089909 0.9999793
## 970x250-180x150      2.640562e-01 -0.5877147862  1.115827236 1.0000000
## 970x500-180x150     -2.500000e-01 -1.2728800911  0.772880091 1.0000000
## 970x66-180x150      -2.500000e-01 -2.1530899092  1.653089909 1.0000000
## 970x90-180x150       2.941176e-02 -0.8341012111  0.892924741 1.0000000
## 250x250-1x1          1.346154e-01 -0.7486000420  1.017830811 1.0000000
## 300x100-1x1         -1.987179e-01 -0.7438503450  0.346414448 0.9999970
## 300x1050-1x1        -2.389478e-01 -0.4974442600  0.019548592 0.1282976
## 300x250-1x1         -1.412833e-02 -0.2502243093  0.221967641 1.0000000
## 300x251-1x1         -2.995539e-01 -0.5541162854 -0.044991503 0.0032433
## 300x50-1x1          -8.613732e-02 -0.3224587981  0.150184153 0.9999970
## 300x600-1x1          2.976890e-02 -0.2071238485  0.266661653 1.0000000
## 320x100-1x1         -2.820513e-01 -0.5464793350 -0.017623229 0.0193323
## 320x106-1x1          6.346154e-01 -0.5919325070  1.861163276 0.9941098
## 320x107-1x1          4.917582e-01 -0.0207614758  1.004277959 0.0844267
## 320x480-1x1         -9.871795e-02 -0.5975960034  0.400160106 1.0000000
## 320x50-1x1          -1.406164e-01 -0.3766886366  0.095455845 0.9512550
## 325x508-1x1         -2.051282e-01 -0.4600905373  0.049834127 0.4044442
## 336x280-1x1         -4.919499e-03 -0.2636654857  0.253826487 1.0000000
## 468x60-1x1          -1.397209e-01 -0.4015216005  0.122079803 0.9901283
## 480x360-1x1          2.346154e-01 -0.5623786189  1.031609388 1.0000000
## 600x300-1x1         -3.205128e-02 -1.0427537951  0.978651231 1.0000000
## 620x366-1x1         -1.653846e-01 -0.4370977091  0.106328478 0.9360951
## 640x360-1x1          3.012821e-01 -0.4326248976  1.035189000 0.9999457
## 640x480-1x1          4.258242e-01  0.1299207789  0.721727573 0.0000185
## 728x90-1x1           3.519051e-02 -0.2017395252  0.272120546 1.0000000
## 750x570-1x1          6.346154e-01 -1.0838491077  2.353079877 0.9999959
## 970x250-1x1          1.486716e-01 -0.0898296779  0.387172897 0.9153088
## 970x500-1x1         -3.653846e-01 -0.9799191462  0.249149915 0.9523800
## 970x66-1x1          -3.653846e-01 -2.0838491077  1.353079877 1.0000000
## 970x90-1x1          -8.597285e-02 -0.3635042329  0.191558532 1.0000000
## 300x100-250x250     -3.333333e-01 -1.3160847366  0.649418070 0.9999996
## 300x1050-250x250    -3.735632e-01 -1.2311478461  0.484021409 0.9997943
## 300x250-250x250     -1.487437e-01 -0.9998443594  0.702356922 1.0000000
## 300x251-250x250     -4.341693e-01 -1.2905763097  0.422237752 0.9958680
## 300x50-250x250      -2.207527e-01 -1.0719159296  0.630410515 1.0000000
## 300x600-250x250     -1.048465e-01 -0.9561684934  0.746475529 1.0000000
## 320x100-250x250     -4.166667e-01 -1.2760578386  0.442724505 0.9981763
## 320x106-250x250      5.000000e-01 -0.9741271050  1.974127105 0.9999996
## 320x107-250x250      3.571429e-01 -0.6078998632  1.322185577 0.9999956
## 320x480-250x250     -2.333333e-01 -1.1912009153  0.724534249 1.0000000
## 320x50-250x250      -2.752318e-01 -1.1263258377  0.575862276 0.9999999
## 325x508-250x250     -3.397436e-01 -1.1962695859  0.516782406 0.9999758
## 336x280-250x250     -1.395349e-01 -0.9971947678  0.718125000 1.0000000
## 468x60-250x250      -2.743363e-01 -1.1329226788  0.584250112 0.9999999
## 480x360-250x250      1.000000e-01 -1.0418539455  1.241853946 1.0000000
## 600x300-250x250     -1.666667e-01 -1.4667245735  1.133391240 1.0000000
## 620x366-250x250     -3.000000e-01 -1.1616606040  0.561660604 0.9999991
## 640x360-250x250      1.666667e-01 -0.9320828047  1.265416138 1.0000000
## 640x480-250x250      2.912088e-01 -0.5783829587  1.160800541 0.9999997
## 728x90-250x250      -9.942487e-02 -0.9507572611  0.751907513 1.0000000
## 750x570-250x250      5.000000e-01 -1.4030899092  2.403089909 1.0000000
## 970x250-250x250      1.405622e-02 -0.8377147862  0.865827236 1.0000000
## 970x500-250x250     -5.000000e-01 -1.5228800911  0.522880091 0.9978730
## 970x66-250x250      -5.000000e-01 -2.4030899092  1.403089909 1.0000000
## 970x90-250x250      -2.205882e-01 -1.0841012111  0.642924741 1.0000000
## 300x1050-300x100    -4.022989e-02 -0.5427746426  0.462314872 1.0000000
## 300x250-300x100      1.845896e-01 -0.3068085339  0.675987763 0.9999936
## 300x251-300x100     -1.008359e-01 -0.6013685039  0.399696613 1.0000000
## 300x50-300x100       1.125806e-01 -0.3789259056  0.604087158 1.0000000
## 300x600-300x100      2.284869e-01 -0.2632946108  0.720268313 0.9992067
## 320x100-300x100     -8.333333e-02 -0.5889547590  0.422288092 1.0000000
## 320x106-300x100      8.333333e-01 -0.4667245735  2.133391240 0.8848482
## 320x107-300x100      6.904762e-01  0.0208439841  1.360108397 0.0325036
## 320x480-300x100      1.000000e-01 -0.5592496828  0.759249683 1.0000000
## 320x50-300x100       5.810155e-02 -0.4332851928  0.549488298 1.0000000
## 325x508-300x100     -6.410256e-03 -0.5071463363  0.494325823 1.0000000
## 336x280-300x100      1.937984e-01 -0.3088747210  0.696471620 0.9999879
## 468x60-300x100       5.899705e-02 -0.4452553057  0.563249406 1.0000000
## 480x360-300x100      4.333333e-01 -0.4727186920  1.339385359 0.9985956
## 600x300-300x100      1.666667e-01 -0.9320828047  1.265416138 1.0000000
## 620x366-300x100      3.333333e-02 -0.4761358377  0.542802504 1.0000000
## 640x360-300x100      5.000000e-01 -0.3510876809  1.351087681 0.9593017
## 640x480-300x100      6.245421e-01  0.1017709976  1.147313252 0.0023690
## 728x90-300x100       2.339085e-01 -0.2578909641  0.725707882 0.9987390
## 750x570-300x100      8.333333e-01 -0.9383469545  2.605013621 0.9989860
## 970x250-300x100      3.473896e-01 -0.1451687581  0.839947875 0.7247470
## 970x500-300x100     -1.666667e-01 -0.9172554492  0.583922116 1.0000000
## 970x66-300x100      -1.666667e-01 -1.9383469545  1.605013621 1.0000000
## 970x90-300x100       1.127451e-01 -0.3998507457  0.625340942 1.0000000
## 300x250-300x1050     2.248195e-01  0.1193528538  0.330286146 0.0000000
## 300x251-300x1050    -6.060606e-02 -0.2026761657  0.081464044 0.9998701
## 300x50-300x1050      1.528105e-01  0.0468400265  0.258780996 0.0000173
## 300x600-300x1050     2.687167e-01  0.1614783135  0.375955159 0.0000000
## 320x100-300x1050    -4.310345e-02 -0.2021760701  0.115969174 1.0000000
## 320x106-300x1050     8.735632e-01 -0.3346592918  2.081785729 0.6698159
## 320x107-300x1050     7.307061e-01  0.2637388768  1.197673274 0.0000010
## 320x480-300x1050     1.402299e-01 -0.3117227864  0.592182557 1.0000000
## 320x50-300x1050      9.833144e-02 -0.0070820653  0.203744941 0.1163860
## 325x508-300x1050     3.381963e-02 -0.1089658561  0.176605113 1.0000000
## 336x280-300x1050     2.340283e-01  0.0845914269  0.383465242 0.0000010
## 468x60-300x1050      9.922694e-02 -0.0554388702  0.253892741 0.8837810
## 480x360-300x1050     4.735632e-01 -0.2949296869  1.242056124 0.9259139
## 600x300-300x1050     2.068966e-01 -0.7814867109  1.195279814 1.0000000
## 620x366-300x1050     7.356322e-02 -0.0973458484  0.244472285 0.9998417
## 640x360-300x1050     5.402299e-01 -0.1626223850  1.243082155 0.5199266
## 640x480-300x1050     6.647720e-01  0.4575505722  0.871993447 0.0000000
## 728x90-300x1050      2.741383e-01  0.1668175837  0.381459105 0.0000000
## 750x570-300x1050     8.735632e-01 -0.8318698982  2.578996335 0.9950479
## 970x250-300x1050     3.876194e-01  0.2768730305  0.498365856 0.0000000
## 970x500-300x1050    -1.264368e-01 -0.7035282764  0.450654713 1.0000000
## 970x66-300x1050     -1.264368e-01 -1.8318698982  1.578996335 1.0000000
## 970x90-300x1050      1.529750e-01 -0.0270404939  0.332990460 0.2767329
## 300x251-300x250     -2.854256e-01 -0.3808447511 -0.190006370 0.0000000
## 300x50-300x250      -7.200899e-02 -0.0842829775 -0.059735000 0.0000000
## 300x600-300x250      4.389724e-02  0.0233792808  0.064415192 0.0000000
## 320x100-300x250     -2.679229e-01 -0.3871915785 -0.148654318 0.0000000
## 320x106-300x250      6.487437e-01 -0.5548851866  1.852372624 0.9884321
## 320x107-300x250      5.058866e-01  0.0509368292  0.960836322 0.0095620
## 320x480-300x250     -8.458961e-02 -0.5241144996  0.354935270 1.0000000
## 320x50-300x250      -1.264881e-01 -0.1322251194 -0.120751005 0.0000000
## 325x508-300x250     -1.909999e-01 -0.2874809670 -0.094518775 0.0000000
## 336x280-300x250      9.208835e-03 -0.0968680080  0.115285678 1.0000000
## 468x60-300x250      -1.255926e-01 -0.2389169598 -0.012268169 0.0101453
## 480x360-300x250      2.487437e-01 -0.5125067348  1.009994172 0.9999998
## 600x300-300x250     -1.792295e-02 -1.0006855751  0.964839679 1.0000000
## 620x366-300x250     -1.512563e-01 -0.2859070011 -0.016605562 0.0079595
## 640x360-300x250      3.154104e-01 -0.3795156688  1.010336439 0.9995077
## 640x480-300x250      4.399525e-01  0.2614542428  0.618450777 0.0000000
## 728x90-300x250       4.931884e-02  0.0283748040  0.070262885 0.0000000
## 750x570-300x250      6.487437e-01 -1.0534381232  2.350925560 0.9999909
## 970x250-300x250      1.627999e-01  0.1283663095  0.197233578 0.0000000
## 970x500-300x250     -3.512563e-01 -0.9186675084  0.216154946 0.9218270
## 970x66-300x250      -3.512563e-01 -2.0534381232  1.350925560 1.0000000
## 970x90-300x250      -7.184452e-02 -0.2178803998  0.074191366 0.9976026
## 300x50-300x251       2.134166e-01  0.1174407825  0.309392361 0.0000000
## 300x600-300x251      3.293228e-01  0.2319488381  0.426696756 0.0000000
## 320x100-300x251      1.750261e-02 -0.1350939392  0.170099164 1.0000000
## 320x106-300x251      9.341693e-01 -0.2732176692  2.141556227 0.5031424
## 320x107-300x251      7.913121e-01  0.3265111322  1.256113140 0.0000000
## 320x480-300x251      2.008359e-01 -0.2488782144  0.650550106 0.9996491
## 320x50-300x251       1.589375e-01  0.0635770498  0.254297947 0.0000001
## 325x508-300x251      9.442569e-02 -0.0411077132  0.229959092 0.7500089
## 336x280-300x251      2.946344e-01  0.1521107203  0.437158070 0.0000000
## 468x60-300x251       1.598330e-01  0.0118359692  0.307830022 0.0157903
## 480x360-300x251      5.341693e-01 -0.2330092883  1.301347846 0.7512179
## 600x300-300x251      2.675026e-01 -0.7198590652  1.254864290 1.0000000
## 620x366-300x251      1.341693e-01 -0.0307292281  0.299067786 0.3768010
## 640x360-300x251      6.008359e-01 -0.1005789971  1.302250888 0.2598202
## 640x480-300x251      7.253781e-01  0.5230853797  0.927670761 0.0000000
## 728x90-300x251       3.347444e-01  0.2372797745  0.432209035 0.0000000
## 750x570-300x251      9.341693e-01 -0.7706719816  2.639010540 0.9850243
## 970x250-300x251      4.482255e-01  0.3470011171  0.549449891 0.0000000
## 970x500-300x251     -6.583072e-02 -0.6411707960  0.509509354 1.0000000
## 970x66-300x251      -6.583072e-02 -1.7706719816  1.639010540 1.0000000
## 970x90-300x251       2.135810e-01  0.0392618523  0.387900235 0.0013679
## 300x600-300x50       1.159062e-01  0.0929384936  0.138873957 0.0000000
## 320x100-300x50      -1.959140e-01 -0.3156283545 -0.076199564 0.0000002
## 320x106-300x50       7.207527e-01 -0.4829204508  1.924425865 0.9478638
## 320x107-300x50       5.778956e-01  0.1228287538  1.032962375 0.0005939
## 320x480-300x50      -1.258063e-02 -0.4522266822  0.427065430 1.0000000
## 320x50-300x50       -5.447907e-02 -0.0662877147 -0.042670432 0.0000000
## 325x508-300x50      -1.189909e-01 -0.2160224858 -0.021959279 0.0013411
## 336x280-300x50       8.121782e-02 -0.0253599733  0.187795620 0.5417129
## 468x60-300x50       -5.358358e-02 -0.1673770236  0.060209872 0.9989636
## 480x360-300x50       3.207527e-01 -0.4405677134  1.082073128 0.9999018
## 600x300-300x50       5.408604e-02 -0.9287307843  1.036902865 1.0000000
## 620x366-300x50      -7.924729e-02 -0.2142930128  0.055798427 0.9599104
## 640x360-300x50       3.874194e-01 -0.3075833245  1.082422072 0.9805771
## 640x480-300x50       5.119615e-01  0.3331650737  0.690757923 0.0000000
## 728x90-300x50        1.213278e-01  0.0979786787  0.144676987 0.0000000
## 750x570-300x50       7.207527e-01 -0.9814604265  2.422965841 0.9998901
## 970x250-300x50       2.348089e-01  0.1988616745  0.270756190 0.0000000
## 970x500-300x50      -2.792473e-01 -0.8467523859  0.288257800 0.9975938
## 970x66-300x50       -2.792473e-01 -1.9814604265  1.422965841 1.0000000
## 970x90-300x50        1.644720e-04 -0.1462356965  0.146564640 1.0000000
## 320x100-300x600     -3.118202e-01 -0.4326583907 -0.190981978 0.0000000
## 320x106-300x600      6.048465e-01 -0.5989389668  1.808631931 0.9964767
## 320x107-300x600      4.619893e-01  0.0066255970  0.917353082 0.0410036
## 320x480-300x600     -1.284869e-01 -0.5684402468  0.311466544 1.0000000
## 320x50-300x600      -1.703853e-01 -0.1906283149 -0.150142282 0.0000000
## 325x508-300x600     -2.348971e-01 -0.3333118816 -0.136482334 0.0000000
## 336x280-300x600     -3.468840e-02 -0.1425269961  0.073150193 0.9999999
## 468x60-300x600      -1.694898e-01 -0.2844649481 -0.054514654 0.0000085
## 480x360-300x600      2.048465e-01 -0.5566514618  0.966344426 1.0000000
## 600x300-300x600     -6.182018e-02 -1.0447745308  0.921134162 1.0000000
## 620x366-300x600     -1.951535e-01 -0.3311964603 -0.059110575 0.0000204
## 640x360-300x600      2.715131e-01 -0.4236840078  0.966710305 0.9999834
## 640x480-300x600      3.960553e-01  0.2165044527  0.575606094 0.0000000
## 728x90-300x600       5.421608e-03 -0.0231356803  0.033978896 1.0000000
## 750x570-300x600      6.048465e-01 -1.0974460568  2.307139021 0.9999985
## 970x250-300x600      1.189027e-01  0.0793737172  0.158431697 0.0000000
## 970x500-300x600     -3.951535e-01 -0.9628967401  0.172589704 0.7519958
## 970x66-300x600      -3.951535e-01 -2.0974460568  1.307139021 1.0000000
## 970x90-300x600      -1.157418e-01 -0.2630623051  0.031578799 0.4641930
## 320x106-320x100      9.166667e-01 -0.2928387806  2.126172114 0.5555522
## 320x107-320x100      7.738095e-01  0.3035328420  1.244086206 0.0000001
## 320x480-320x100      1.833333e-01 -0.2720379548  0.638704621 0.9999656
## 320x50-320x100       1.414349e-01  0.0222132462  0.260656526 0.0027445
## 325x508-320x100      7.692308e-02 -0.0763397282  0.230185882 0.9965454
## 336x280-320x100      2.771318e-01  0.1176539407  0.436609625 0.0000000
## 468x60-320x100       1.423304e-01 -0.0220572807  0.306718048 0.2380222
## 480x360-320x100      5.166667e-01 -0.2538416969  1.287175030 0.8202053
## 600x300-320x100      2.500000e-01 -0.7399511435  1.239951144 1.0000000
## 620x366-320x100      1.166667e-01 -0.0630878935  0.296421227 0.8689412
## 640x360-320x100      5.833333e-01 -0.1217220590  1.288388726 0.3368935
## 640x480-320x100      7.078755e-01  0.4933002568  0.922450659 0.0000000
## 728x90-320x100       3.172418e-01  0.1963305093  0.438153076 0.0000000
## 750x570-320x100      9.166667e-01 -0.7896755933  2.623008927 0.9890262
## 970x250-320x100      4.307229e-01  0.3067609590  0.554684824 0.0000000
## 970x500-320x100     -8.333333e-02 -0.6631060355  0.496439369 1.0000000
## 970x66-320x100      -8.333333e-02 -1.7896755933  1.623008927 1.0000000
## 970x90-320x100       1.960784e-01  0.0076444516  0.384512411 0.0283922
## 320x107-320x106     -1.428571e-01 -1.4295807700  1.143866484 1.0000000
## 320x480-320x106     -7.333333e-01 -2.0146843958  0.548017729 0.9715508
## 320x50-320x106      -7.752318e-01 -1.9788560304  0.428392469 0.8788936
## 325x508-320x106     -8.397436e-01 -2.0472149238  0.367727744 0.7535795
## 336x280-320x106     -6.395349e-01 -1.8478108114  0.568741044 0.9913750
## 468x60-320x106      -7.743363e-01 -1.9832700445  0.434597478 0.8857681
## 480x360-320x106     -4.000000e-01 -1.8241420833  1.024142083 1.0000000
## 600x300-320x106     -6.666667e-01 -2.2205330708  0.887199737 0.9998526
## 620x366-320x106     -8.000000e-01 -2.0111190020  0.411119002 0.8439580
## 640x360-320x106     -3.333333e-01 -1.7231536963  1.056487030 1.0000000
## 640x480-320x106     -2.087912e-01 -1.4255656546  1.007983237 1.0000000
## 728x90-320x106      -5.994249e-01 -1.8032176610  0.604367913 0.9970063
## 750x570-320x106      1.301181e-13 -2.0847305445  2.084730545 1.0000000
## 970x250-320x106     -4.859438e-01 -1.6900468006  0.718159250 0.9999636
## 970x500-320x106     -1.000000e+00 -2.3306516905  0.330651690 0.5767240
## 970x66-320x106      -1.000000e+00 -3.0847305445  1.084730545 0.9985134
## 970x90-320x106      -7.205882e-01 -1.9330258213  0.491849351 0.9526289
## 320x480-320x107     -5.904762e-01 -1.2230244303  0.042072049 0.1154770
## 320x50-320x107      -6.323746e-01 -1.0873120677 -0.177437208 0.0000530
## 325x508-320x107     -6.968864e-01 -1.1619066111 -0.231866283 0.0000049
## 336x280-320x107     -4.966777e-01 -0.9637831335 -0.029572348 0.0203244
## 468x60-320x107      -6.314791e-01 -1.1002835450 -0.162674736 0.0001296
## 480x360-320x107     -2.571429e-01 -1.1439559175  0.629670203 1.0000000
## 600x300-320x107     -5.238095e-01 -1.6067488260  0.559129778 0.9982565
## 620x366-320x107     -6.571429e-01 -1.1315540370 -0.182731677 0.0000587
## 640x360-320x107     -1.904762e-01 -1.0210527203  0.640100339 1.0000000
## 640x480-320x107     -6.593407e-02 -0.5546024428  0.422734311 1.0000000
## 728x90-320x107      -4.565677e-01 -0.9119508712 -0.001184591 0.0482765
## 750x570-320x107      1.428571e-01 -1.6190617468  1.904776033 1.0000000
## 970x250-320x107     -3.430866e-01 -0.7992892478  0.113115983 0.5749405
## 970x500-320x107     -8.571429e-01 -1.5843915435 -0.129894171 0.0031397
## 970x66-320x107      -8.571429e-01 -2.6190617468  0.904776033 0.9980559
## 970x90-320x107      -5.777311e-01 -1.0554984316 -0.099963753 0.0018276
## 320x50-320x480      -4.189845e-02 -0.4814105833  0.397613689 1.0000000
## 325x508-320x480     -1.064103e-01 -0.5563509255  0.343530413 1.0000000
## 336x280-320x480      9.379845e-02 -0.3582970054  0.545893905 1.0000000
## 468x60-320x480      -4.100295e-02 -0.4948536107  0.412847711 1.0000000
## 480x360-320x480      3.333333e-01 -0.5456662438  1.212332910 0.9999919
## 600x300-320x480      6.666667e-02 -1.0098835574  1.143216891 1.0000000
## 620x366-320x480     -6.666667e-02 -0.5263065476  0.392973214 1.0000000
## 640x360-320x480      4.000000e-01 -0.4222288152  1.222228815 0.9980559
## 640x480-320x480      5.245421e-01  0.0502008601  0.998883389 0.0105445
## 728x90-320x480       1.339085e-01 -0.3060650135  0.573881932 1.0000000
## 750x570-320x480      7.333333e-01 -1.0246658209  2.491332488 0.9999216
## 970x250-320x480      2.473896e-01 -0.1934320372  0.688211154 0.9785436
## 970x500-320x480     -2.666667e-01 -0.9843668160  0.451033483 0.9999952
## 970x66-320x480      -2.666667e-01 -2.0246658209  1.491332488 1.0000000
## 970x90-320x480       1.274510e-02 -0.4503580039  0.475848200 1.0000000
## 325x508-320x50      -6.451181e-02 -0.1609348098  0.031911192 0.8238902
## 336x280-320x50       1.356969e-01  0.0296728911  0.241720903 0.0004907
## 468x60-320x50        8.954975e-04 -0.1123794413  0.114170436 1.0000000
## 480x360-320x50       3.752318e-01 -0.3860113119  1.136474873 0.9975149
## 600x300-320x50       1.085651e-01 -0.8741918113  1.091322039 1.0000000
## 620x366-320x50      -2.476822e-02 -0.1593773184  0.109840880 1.0000000
## 640x360-320x50       4.418984e-01 -0.2530195434  1.136816438 0.8942777
## 640x480-320x50       5.664406e-01  0.3879736995  0.744907444 0.0000000
## 728x90-320x50        1.758069e-01  0.1551321383  0.196481675 0.0000000
## 750x570-320x50       7.752318e-01 -0.9269467692  2.477410331 0.9994715
## 970x250-320x50       2.892880e-01  0.2550174881  0.323558523 0.0000000
## 970x500-320x50      -2.247682e-01 -0.7921695709  0.342633132 0.9999766
## 970x66-320x50       -2.247682e-01 -1.9269467692  1.477410331 1.0000000
## 970x90-320x50        5.464355e-02 -0.0913539627  0.200641053 0.9999942
## 336x280-325x508      2.002087e-01  0.0569719166  0.343445495 0.0000450
## 468x60-325x508       6.540731e-02 -0.0832765856  0.214091199 0.9997455
## 480x360-325x508      4.397436e-01 -0.3275677773  1.207054957 0.9709975
## 600x300-325x508      1.730769e-01 -0.8143879433  1.160541789 1.0000000
## 620x366-325x508      3.974359e-02 -0.1257716590  0.205258838 1.0000000
## 640x360-325x508      5.064103e-01 -0.1951499348  1.207970448 0.6735673
## 640x480-325x508      6.309524e-01  0.4281566400  0.833748122 0.0000000
## 728x90-325x508       2.403187e-01  0.1418142280  0.338823203 0.0000000
## 750x570-325x508      8.397436e-01 -0.8651574349  2.544644614 0.9975495
## 970x250-325x508      3.537998e-01  0.2515738082  0.456025821 0.0000000
## 970x500-325x508     -1.602564e-01 -0.7357735532  0.415260733 1.0000000
## 970x66-325x508      -1.602564e-01 -1.8651574349  1.544644614 1.0000000
## 970x90-325x508       1.191554e-01 -0.0557473625  0.294058071 0.7927865
## 468x60-336x280      -1.348014e-01 -0.2898839405  0.020281142 0.2302888
## 480x360-336x280      2.395349e-01 -0.5290420016  1.008111769 1.0000000
## 600x300-336x280     -2.713178e-02 -1.0155803436  0.961316778 1.0000000
## 620x366-336x280     -1.604651e-01 -0.3317514038  0.010821171 0.1110454
## 640x360-336x280      3.062016e-01 -0.3967425417  1.009145643 0.9997943
## 640x480-336x280      4.307437e-01  0.2232110093  0.638276341 0.0000000
## 728x90-336x280       4.011001e-02 -0.0678104649  0.148030484 0.9999951
## 750x570-336x280      6.395349e-01 -1.0659360772  2.345005845 0.9999939
## 970x250-336x280      1.535911e-01  0.0422634340  0.264918783 0.0000660
## 970x500-336x280     -3.604651e-01 -0.9376684397  0.216738207 0.9135157
## 970x66-336x280      -3.604651e-01 -2.0659360772  1.345005845 1.0000000
## 970x90-336x280      -8.105335e-02 -0.2614270058  0.099320303 0.9996002
## 480x360-468x60       3.743363e-01 -0.3952743656  1.143946932 0.9980623
## 600x300-468x60       1.076696e-01 -0.8815829695  1.096922203 1.0000000
## 620x366-468x60      -2.566372e-02 -0.2015304682  0.150203035 1.0000000
## 640x360-468x60       4.410029e-01 -0.2630712782  1.145077178 0.9105561
## 640x480-468x60       5.655451e-01  0.3542161146  0.776874034 0.0000000
## 728x90-468x60        1.749114e-01  0.0598594609  0.289963357 0.0000030
## 750x570-468x60       7.743363e-01 -0.9316007967  2.480273363 0.9995070
## 970x250-468x60       2.883925e-01  0.1701386579  0.406646358 0.0000000
## 970x500-468x60      -2.256637e-01 -0.8042428371  0.352915404 0.9999839
## 970x66-468x60       -2.256637e-01 -1.9316007967  1.480273363 1.0000000
## 970x90-468x60        5.374805e-02 -0.1309808808  0.238476977 1.0000000
## 600x300-480x360     -2.666667e-01 -1.5097597899  0.976426457 1.0000000
## 620x366-480x360     -4.000000e-01 -1.1730387755  0.373038776 0.9941022
## 640x360-480x360      6.666667e-02 -0.9640517007  1.097385034 1.0000000
## 640x480-480x360      1.912088e-01 -0.5906606118  0.973078194 1.0000000
## 728x90-480x360      -1.994249e-01 -0.9609344178  0.562084669 1.0000000
## 750x570-480x360      4.000000e-01 -1.4646396849  2.264639685 1.0000000
## 970x250-480x360     -8.594378e-02 -0.8479436486  0.676056098 1.0000000
## 970x500-480x360     -6.000000e-01 -1.5494280555  0.349428056 0.9012172
## 970x66-480x360      -6.000000e-01 -2.4646396849  1.264639685 0.9999999
## 970x90-480x360      -3.205882e-01 -1.0956912035  0.454514733 0.9999355
## 620x366-600x300     -1.333333e-01 -1.1252552439  0.858588577 1.0000000
## 640x360-600x300      3.333333e-01 -0.8702864077  1.536953074 1.0000000
## 640x480-600x300      4.578755e-01 -0.5409437909  1.456694707 0.9993957
## 728x90-600x300       6.724179e-02 -0.9157215401  1.050205125 1.0000000
## 750x570-600x300      6.666667e-01 -1.2988361400  2.632169473 0.9999996
## 970x250-600x300      1.807229e-01 -0.8026203525  1.164066136 1.0000000
## 970x500-600x300     -3.333333e-01 -1.4681169078  0.801450241 1.0000000
## 970x66-600x300      -3.333333e-01 -2.2988361400  1.632169473 1.0000000
## 970x90-600x300      -5.392157e-02 -1.0474530173  0.939609880 1.0000000
## 640x360-620x366      4.666667e-01 -0.2411531666  1.174486500 0.8467817
## 640x480-620x366      5.912088e-01  0.3677175676  0.814700015 0.0000000
## 728x90-620x366       2.005751e-01  0.0644672696  0.336682982 0.0000086
## 750x570-620x366      8.000000e-01 -0.9074863742  2.507486374 0.9990618
## 970x250-620x366      3.140562e-01  0.1752312601  0.452881190 0.0000000
## 970x500-620x366     -2.000000e-01 -0.7831313713  0.383131371 0.9999994
## 970x66-620x366      -2.000000e-01 -1.9074863742  1.507486374 1.0000000
## 970x90-620x366       7.941176e-02 -0.1191157519  0.277939281 0.9999705
## 640x480-640x360      1.245421e-01 -0.5929115195  0.841995769 1.0000000
## 728x90-640x360      -2.660915e-01 -0.9613014033  0.429118322 0.9999899
## 750x570-640x360      3.333333e-01 -1.5052261904  2.171892857 1.0000000
## 970x250-640x360     -1.526104e-01 -0.8483573607  0.543136477 1.0000000
## 970x500-640x360     -6.666667e-01 -1.5637918534  0.230458520 0.6045215
## 970x66-640x360      -6.666667e-01 -2.5052261904  1.171892857 0.9999974
## 970x90-640x360      -3.872549e-01 -1.0973285455  0.322818742 0.9860788
## 728x90-640x480      -3.906337e-01 -0.5702336752 -0.211033656 0.0000000
## 750x570-640x480      2.087912e-01 -1.5027112111  1.920293629 1.0000000
## 970x250-640x480     -2.771526e-01 -0.4588203561 -0.095484776 0.0000026
## 970x500-640x480     -7.912088e-01 -1.3859969983 -0.196420584 0.0001820
## 970x66-640x480      -7.912088e-01 -2.5027112111  0.920293629 0.9992829
## 970x90-640x480      -5.117970e-01 -0.7423268290 -0.281267224 0.0000000
## 750x570-728x90       5.994249e-01 -1.1028728538  2.301722602 0.9999988
## 970x250-728x90       1.134811e-01  0.0737292770  0.153232921 0.0000000
## 970x500-728x90      -4.005751e-01 -0.9683339062  0.167183655 0.7239621
## 970x66-728x90       -4.005751e-01 -2.1028728538  1.301722602 1.0000000
## 970x90-728x90       -1.211634e-01 -0.2685438597  0.026217137 0.3519013
## 970x250-750x570     -4.859438e-01 -2.1884609048  1.216573355 1.0000000
## 970x500-750x570     -1.000000e+00 -2.7942503734  0.794250373 0.9806289
## 970x66-750x570      -1.000000e+00 -3.4072394821  1.407239482 0.9999287
## 970x90-750x570      -7.205882e-01 -2.4290101331  0.987833663 0.9998993
## 970x500-970x250     -5.140562e-01 -1.0824724941  0.054360044 0.1603141
## 970x66-970x250      -5.140562e-01 -2.2165733546  1.188460905 1.0000000
## 970x90-970x250      -2.346445e-01 -0.3845378679 -0.084751052 0.0000010
## 970x66-970x500       2.258194e-13 -1.7942503734  1.794250373 1.0000000
## 970x90-970x500       2.794118e-01 -0.3064532872  0.865276817 0.9986701
## 970x90-970x66        2.794118e-01 -1.4290101331  1.987833663 1.0000000

## ================================================================================
# 7. FINAL SUMMARY
cat("\n")
cat(strrep("=", 80), "\n")
## ================================================================================
cat("FINAL SUMMARY: ARE THERE SYSTEMATIC DIFFERENCES?\n")
## FINAL SUMMARY: ARE THERE SYSTEMATIC DIFFERENCES?
cat(strrep("=", 80), "\n\n")
## ================================================================================
# Check each category for systematic differences
check_differences <- function(data, factor_var) {
  if (length(unique(data[[factor_var]])) > 1) {
    # Run ANOVA
    aov_test <- aov(BID_WON_numeric ~ factor(data[[factor_var]]), data = data)
    p_value <- summary(aov_test)[[1]]$"Pr(>F)"[1]

    # Calculate range of win rates
    win_rates <- data %>%
      group_by(!!sym(factor_var)) %>%
      summarise(win_rate = mean(BID_WON_numeric), .groups = "drop")

    range_diff <- max(win_rates$win_rate) - min(win_rates$win_rate)
    ratio <- max(win_rates$win_rate) / min(win_rates$win_rate)

    return(list(
      has_differences = p_value < 0.05,
      p_value = p_value,
      range = range_diff,
      ratio = ratio,
      levels = nrow(win_rates)
    ))
  } else {
    return(list(
      has_differences = FALSE,
      p_value = NA,
      range = 0,
      ratio = 1,
      levels = 1
    ))
  }
}

# Check all factors
factors_to_check <- list(
  "Device Type" = "DEVICE_TYPE_clean",
  "Region" = "DEVICE_GEO_REGION_clean",
  "Ad Size" = "SIZE"
)

cat("Systematic Differences Analysis:\n")
## Systematic Differences Analysis:
cat(strrep("-", 80), "\n")
## --------------------------------------------------------------------------------
for (factor_name in names(factors_to_check)) {
  factor_var <- factors_to_check[[factor_name]]
  result <- check_differences(bids_analysis, factor_var)

  cat(factor_name, ":\n")
  cat("  Number of categories:", result$levels, "\n")
  cat("  Statistical significance (p-value):",
      ifelse(is.na(result$p_value), "N/A",
             paste(round(result$p_value, 4),
                   ifelse(result$p_value < 0.05, "**SIGNIFICANT**", "not significant"))), "\n")
  cat("  Win rate range:", round(result$range, 4), "\n")
  cat("  Best to worst ratio:", round(result$ratio, 2), "x\n")
  cat("  Conclusion:",
      ifelse(result$has_differences,
             "YES - Systematic differences exist",
             "NO - No systematic differences"), "\n")
  cat(strrep("-", 40), "\n")
}
## Device Type :
##   Number of categories: 5 
##   Statistical significance (p-value): 0 **SIGNIFICANT** 
##   Win rate range: 0.2506 
##   Best to worst ratio: 2.12 x
##   Conclusion: YES - Systematic differences exist 
## ---------------------------------------- 
## Region :
##   Number of categories: 1 
##   Statistical significance (p-value): N/A 
##   Win rate range: 0 
##   Best to worst ratio: 1 x
##   Conclusion: NO - No systematic differences 
## ---------------------------------------- 
## Ad Size :
##   Number of categories: 38 
##   Statistical significance (p-value): 0 **SIGNIFICANT** 
##   Win rate range: 1 
##   Best to worst ratio: Inf x
##   Conclusion: YES - Systematic differences exist 
## ----------------------------------------

0.9.3 4. Geographic Analysis

  • Visual reprsentation of the data in a geographic map

  • Q: how does the proximity to portland affect the price of a bid?

  • quick comparison of price between raw and cleaned data. We see there is no real difference in the mean, meaning that the outliers did not affect the overall trends of the data too much. # price

bids_clean_2 <- bids_clean

0.9.4 Comparison between uncleaned and cleaned price data

This is not necessarily needed for the presentation. Just was central to the original Q4. Just shows the data we fixed did not really change the statistics

library(hexbin)
library(geosphere)



col_comparison <- function(df_1, df_2, col_1, col_2, label_1 = "Before", label_2 = "After") {
  bind_rows(
    df_1 %>%
      summarise(
        mean = mean(.data[[col_1]], na.rm = TRUE),
        sd = sd(.data[[col_1]], na.rm = TRUE),
        min = min(.data[[col_1]], na.rm = TRUE),
        max = max(.data[[col_1]], na.rm = TRUE),
        n = n(),
        na_count = sum(is.na(.data[[col_1]]))
      ) %>%
      mutate(dataset = label_1, .before = 1),

    df_2 %>%
      summarise(
        mean = mean(.data[[col_2]], na.rm = TRUE),
        sd = sd(.data[[col_2]], na.rm = TRUE),
        min = min(.data[[col_2]], na.rm = TRUE),
        max = max(.data[[col_2]], na.rm = TRUE),
        n = n(),
        na_count = sum(is.na(.data[[col_2]]))
      ) %>%
      mutate(dataset = label_2, .before = 1)
  )
}

col_comparison(bids, bids_clean, "PRICE_clean", "PRICE_final")
## # A tibble: 2 × 7
##   dataset  mean    sd         min    max      n na_count
##   <chr>   <dbl> <dbl>       <dbl>  <dbl>  <int>    <int>
## 1 Before  0.446 5.48  -999        141.   441535        0
## 2 After   0.442 0.658    0.000071  10.00 440983        0

1 Price Distance to portland (grouped)

broke oregon into 60 hexes and grouped data. Looked at average of bid price and winning big price of each hex. Plotted average bid and winning bid price vs distance to portland.

#------------------------------------------------------------------------
# Group Zips into Hexes and categorize hex by city with most frequent
# zip per hex
#------------------------------------------------------------------------
bids_clean <- bids_clean_2



# Add city based on major_city per hex
hb <- hexbin(
  x = bids_clean$DEVICE_GEO_LONG_clean,
  y = bids_clean$DEVICE_GEO_LAT_clean,
  xbins = 60,  # same as bins in ggplot
  IDs = TRUE
)

# Get cell assignments for each point
bids_clean$hex_id <- hb@cID
zip_city_lookup <- zipcodeR::zip_code_db %>%
  filter(state == "OR") %>%
  select(zipcode, major_city)

# Add city based on zip to bids_clean.
bids_clean <- bids_clean %>%
  left_join(
    zip_city_lookup %>% select(zipcode, city = major_city),
    by = c("DEVICE_GEO_ZIP_clean" = "zipcode")
  )

# Find top ZIP per hex and join city names
hex_with_city <- bids_clean %>%
  # Remove rows where hex_id or ZIP is missing
  filter(!is.na(hex_id), !is.na(DEVICE_GEO_ZIP_clean)) %>%
  # Count how many bids per hex + ZIP combination
  count(hex_id, DEVICE_GEO_ZIP_clean, name = "zip_count") %>%
  # Keep only the most frequent ZIP for each hex (1 row per hex)
  slice_max(zip_count, n = 1, by = hex_id, with_ties = FALSE) %>%
  # Attach city name by matching ZIP to zipcodeR lookup table
  left_join(zip_city_lookup, by = c("DEVICE_GEO_ZIP_clean" = "zipcode")) %>%
  # Keep only the columns we need for the final join
  select(hex_id, zip_count, major_city)

# Join to bids
bids_clean <- bids_clean %>%
  left_join(hex_with_city, by = "hex_id")

ggplot() +
  geom_sf(data = zip_code_db, fill = "white", color = "gray80") +
  stat_summary_hex(data = bids_clean,
                   aes(x = DEVICE_GEO_LONG_clean, y = DEVICE_GEO_LAT_clean, z = PRICE_final),
                   fun = mean, bins = 30, alpha = 0.8) +
  theme_minimal()

2 Price vs Distance to Portland (individual bids)

similar to above but did not group bids into regions. Just plotted each bid and calculated the distance to portland for each based on lat/long

#------------------------------------------------------------------------
# Calculate distance each hex is from Portland's hex
#------------------------------------------------------------------------
# Check if 'dist_to_portland_km' exists in the bids_clean data frame
if ("dist_to_portland_km" %in% colnames(bids_clean)) {
  cat("'dist_to_portland_km' exists in bids_clean\n")
} else {
  cat("Adding 'dist_to_portland_km' to bids_clean\n")

  # Step 1: Get hex centers from the hexbin object
  hex_centers <- tibble(
    hex_id = hb@cell,
    hex_x = hb@xcm,  # longitude center
    hex_y = hb@ycm   # latitude center
  )

  # Step 2: Find Portland's hex (by city name or coordinates)
  # Option A: By city name (if you have major_city in hex_with_city)
  portland_hex_id <- hex_with_city %>%
    filter(major_city == "Portland") %>%
    pull(hex_id) %>%
    first()

  # Step 3: Get Portland's hex center
  portland_center <- hex_centers %>%
    filter(hex_id == portland_hex_id)

  hex_distances <- hex_centers %>%
    rowwise() %>%
    mutate(
      dist_to_portland_km = distHaversine(
        c(hex_x, hex_y),                              # this hex
        c(portland_center$hex_x, portland_center$hex_y)  # Portland hex
      ) / 1000  # meters to km
    ) %>%
    ungroup()

  # Step 5: Join back to bids_clean
  bids_clean <- bids_clean %>%
    left_join(hex_distances %>% select(hex_id, dist_to_portland_km), by = "hex_id")

  # Calculate mean price per hex with distance
  hex_price_distance_all <- bids_clean %>%
    filter(!is.na(dist_to_portland_km), !is.na(PRICE_final)) %>%
    group_by(hex_id, dist_to_portland_km) %>%
    summarise(
      avg_price = mean(PRICE_final, na.rm = TRUE),
      n_bids = n(),
      .groups = "drop"
    )


  # Calculate mean price per hex with distance
  hex_price_distance_winning <- bids_clean %>%
    filter(!is.na(dist_to_portland_km), !is.na(PRICE_final), BID_WON_clean == TRUE) %>%
    group_by(hex_id, dist_to_portland_km) %>%
    summarise(
      avg_price = mean(PRICE_final, na.rm = TRUE),
      n_bids = n(),
      .groups = "drop"
    )

}
## Adding 'dist_to_portland_km' to bids_clean
p1 <- ggplot(hex_price_distance_all, aes(x = dist_to_portland_km, y = avg_price)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Average Bid Price vs Distance from Portland",
    x = "Distance from Portland (km)",
    y = "Average Price"
  ) +
  scale_fill_viridis_d(option = "cividis") +
  theme_minimal()

p2 <- ggplot(hex_price_distance_winning, aes(x = dist_to_portland_km, y = avg_price)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Average Winning Bid Price vs Distance from Portland",
    x = "Distance from Portland (km)",
    y = "Average Price"
  ) +
  scale_fill_viridis_d(option = "cividis") +
  theme_minimal()

p1

p2

#------------------------------------------------------------------------
# Calculate distance each bid lat/long is from Portland
#------------------------------------------------------------------------

# Portland center coordinates
portland_lat <- 45.52
portland_lon <- -122.68

if ("dist_to_portland_row_km_bid" %in% colnames(bids_clean)) {
  cat("'dist_to_portland_row_km_bid' exists in bids_clean\n")
} else {
  cat("Adding 'dist_to_portland_row_km_bid' to bids_clean\n")

  bids_clean <- bids_clean %>%
    mutate(
      dist_to_portland_row_km_bid = distHaversine(
        cbind(DEVICE_GEO_LONG_clean, DEVICE_GEO_LAT_clean),
        c(portland_lon, portland_lat)
      ) / 1000
    )
}
## Adding 'dist_to_portland_row_km_bid' to bids_clean
ggplot(bids_clean %>% filter(!is.na(dist_to_portland_row_km_bid), !is.na(PRICE_final), BID_WON_clean == TRUE) %>% sample_n(10000),
             aes(x = dist_to_portland_row_km_bid, y = PRICE_final)) +
  geom_point(alpha = 0.1, size = 0.3) +
  geom_smooth(method = "loess", color = "red") +
  labs(title = "Individual Winning Bids (10k sample)", x = "Distance (km)", y = "Price") +
  theme_minimal()

3

# Top populated urban areas (> 50,000) that are present in bids_clean$city
urban_cities <- c("Portland", "Salem", "Eugene", "Gresham", "Hillsboro", "Bend", "Beaverton", "Medford", "Springfield", "Corvallis", "Albany")

# add urban/rural flag
bids_clean <- bids_clean %>%
  mutate(area_type = case_when(
    city %in% urban_cities ~ "Urban",
    TRUE ~ "Rural"
  ))

urban_summary <- bids_clean %>%
  group_by(area_type) %>%
  summarise(
    n_bids = n(),
    avg_price = mean(PRICE_final, na.rm = TRUE),
    median_price = median(PRICE_final, na.rm = TRUE),
    sd_price = sd(PRICE_final, na.rm = TRUE),
    win_rate = mean(BID_WON_clean == TRUE, na.rm = TRUE),
    avg_response_time = mean(RESPONSE_TIME_clean, na.rm = TRUE),
    n_cities = n_distinct(city)
  ) %>% print()
## # A tibble: 2 × 8
##   area_type n_bids avg_price median_price sd_price win_rate avg_response_time
##   <chr>      <int>     <dbl>        <dbl>    <dbl>    <dbl>             <dbl>
## 1 Rural      85373     0.489        0.22     0.689    0.293              207.
## 2 Urban     355610     0.431        0.193    0.650    0.268              200.
## # ℹ 1 more variable: n_cities <int>
# urban_summary

# Multiple metrics side-by-side
urban_summary %>%
  pivot_longer(cols = c(avg_price, sd_price, median_price, win_rate, avg_response_time), 
               names_to = "metric", values_to = "value") %>%
  ggplot(aes(x = area_type, y = value, fill = area_type)) +
  scale_fill_viridis_d(option = "cividis") +
  geom_col() +
  facet_wrap(~metric, scales = "free_y", nrow = 1) +
  labs(title = "Urban vs Rural Comparison", x = "", caption = "85373 rural bids, 355610 urban bids") +
  theme_minimal() +
  theme(legend.position = "none")

population_or <- read_csv(here("data", "2025 Preliminary Population Estimates.csv"))
population_or <- population_or %>%
  rename(city = `Incorporated City/Town`,
         population = `Revised Population Estimate july 1, 2024`)


# Add city based on zip to bids_clean.
bids_clean <- bids_clean %>%
  select(-any_of("population")) %>%  # remove if exists
  left_join(
    population_or %>% select(city, population),
    by = c("major_city" = "city")
  )


bids_clean %>%
  filter(BID_WON_clean == TRUE, !is.na(population)) %>% 
    head(10)
## # A tibble: 10 × 27
##    row_id DATE_UTC_clean TIMESTAMP_clean     AUCTION_ID PUBLISHER_ID PRICE_final
##     <int> <date>         <dttm>              <chr>      <chr>              <dbl>
##  1      3 2025-10-21     2025-10-21 23:42:37 0000060c-… LteIcOiSsaE5       0.23 
##  2      9 2025-10-22     2025-10-22 02:43:14 00000e70-… 3                  0.72 
##  3     13 2025-10-21     2025-10-21 21:57:38 00001359-… LteIcOiSsaE5       0.352
##  4     15 2025-10-22     2025-10-22 00:08:29 0000b6e8-… LteIcOiSsaE5       0.11 
##  5     20 2025-10-22     2025-10-22 04:31:41 00011547-… 243                0.765
##  6     24 2025-10-22     2025-10-22 03:48:57 000196e3-… 3                  0.27 
##  7     25 2025-10-21     2025-10-21 23:05:35 0001bca2-… 243                1.06 
##  8     28 2025-10-22     2025-10-22 04:13:46 0003cbbc-… 3                  0.809
##  9     30 2025-10-22     2025-10-22 00:15:39 00057800-… 0b29abca-22…       0.945
## 10     36 2025-10-21     2025-10-21 21:12:42 0005a93e-… LteIcOiSsaE5       2.66 
## # ℹ 21 more variables: DEVICE_GEO_REGION_clean <chr>,
## #   DEVICE_GEO_ZIP_clean <chr>, DEVICE_GEO_CITY_clean <chr>,
## #   DEVICE_GEO_LAT_clean <dbl>, DEVICE_GEO_LONG_clean <dbl>,
## #   BID_WON_clean <chr>, RESPONSE_TIME_clean <int>, DEVICE_TYPE_clean <chr>,
## #   SIZE <chr>, REQUESTED_SIZES_clean <list>, hour <int>, day_of_week <fct>,
## #   pred_prob <dbl>, hex_id <int>, city <chr>, zip_count <int>,
## #   major_city <chr>, dist_to_portland_km <dbl>, …
bids_clean %>%
  filter(BID_WON_clean == TRUE, !is.na(population)) %>%
  ggplot(aes(x = population, y = PRICE_final)) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "loess", color = "red") +
  labs(
    title = "Winning Bid Price vs City Population",
    x = "Population",
    y = "Price"
  ) +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal()

population_or <- read_csv(here("data", "2025 Preliminary Population Estimates.csv"))
population_or <- population_or %>%
  rename(city = `Incorporated City/Town`,
         population = `Revised Population Estimate july 1, 2024`)


# Add city based on zip to bids_clean.
bids_clean <- bids_clean %>%
  select(-any_of("population")) %>%  # remove if exists
  left_join(
    population_or %>% select(city, population),
    by = c("major_city" = "city")
  )


city_grouping_winning_bids <- bids_clean %>%
    filter(BID_WON_clean == TRUE, !is.na(population), !is.na(DEVICE_GEO_ZIP_clean)) %>%
    group_by(city) %>%
      summarise(
        avg_price = mean(PRICE_final, na.rm = TRUE),
        n_bids = n(),
        avg_population = mean(population, na.rm = TRUE),
        ave_km_to_pdx = mean(dist_to_portland_row_km_bid, na.rm = TRUE),
        .groups = "drop"
      ) %>% filter (n_bids > 100) %>%
    mutate(km_to_pdx_o_pop = avg_population/ave_km_to_pdx)




city_grouping_winning_bids %>%
  ggplot(aes(x = avg_population, y = avg_price, size)) +
  geom_point(alpha = 0.5) +
  labs(
    title = "Average Winning Bid Price vs City Population (by ZIP)",
    x = "Population",
    y = "Average Winning Price",
    size = "# Bids"
  ) +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal()

# bids_clean <- bids_clean %>%
#     filter(!is.na(dist_to_portland_km), !is.na(PRICE_final)) %>%
#     group_by(hex_id, dist_to_portland_km) %>%
#     summarise(
#       avg_price = mean(PRICE_final, na.rm = TRUE),
#       n_bids = n(),
#       .groups = "drop"
#     )
# urban <- population_or %>% filter(population > 50000) %>% arrange(desc(population))



city_grouping_winning_bids <- city_grouping_winning_bids %>%
  mutate(
    pop_bucket = cut(
      avg_population,
      breaks = seq(0, max(avg_population, na.rm = TRUE) + 10000, by = 1000),
      labels = FALSE,
      include.lowest = TRUE,
      right = TRUE
    )
  )

t <- city_grouping_winning_bids %>%
  filter(!is.na(avg_population), !is.na(avg_price)) %>%
  group_by(pop_bucket) %>%
  summarise(
    avg_price = mean(avg_price, na.rm = TRUE),
    n_bids = n(),
    .groups = "drop"
  ) %>%
  ggplot(aes(x = pop_bucket, y = avg_price)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "loess", color = "red", se = TRUE) +
  labs(
    title = "Average Winning Bid Price vs City Population (by ZIP)",
    x = "Population",
    y = "Average Winning Price",
    size = "# Bids"
  ) +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal()

#Data Visualization

#Synthesis

*Notes: Things to make clear in the sythesis portion of this all for our own use:

What were points of uncertainity? What steps did we take to address them and how could they still be showing up? How did we clean and prepare the data set for our work? What choices did we make and how did we standardize?

#Presentation Assignment:

##We will decide who is generating what info and transferring it to the slides. ##We also need to decide who will present what and how to field questions.

Final presentation ask: Your final presentation, in 20–30 minutes, should tell a clear, professional story of your team’s data-cleaning workflow, exploratory analysis, and collaboration practices. The goal is to demonstrate not only what you found, but how you worked as a data science team.

Things we need to include in the final presentation:

Key Issues You Found Examples include: • Missingness patterns • Formatting inconsistencies (dates, numeric/character mismatches, categorical typos) • Duplicates • Outliers or implausible values • Structural issues (names, types, consistency)

Your Cleaning Strategy For each issue: • What you observed • Why it was a problem • Principle behind your fix (e.g., “We chose this imputation strategy because…”) • Concise R code or pseudo-code Focus on justifying decisions, not on full code dumps.

Reproducibility and Workflow Highlight: • Git usage (branches, PRs, merge conflicts) • Script modularity and readability • R Markdown / Quarto documentation • Naming conventions and folder structure

EDA Overview Patterns Show: • Distributions • Key relationships • Missingness profiles • Surprising patterns

Deep Dives on Guiding Questions For each: • State the question • Show relevant figures • Interpret results clearly • Connect to cleaning decisions

Visualization Quality Ensure: • Clear labels/titles • Good color choices • No clutter • Interpretation accompanies each visual

Insights Summarize: • 3–5 most important findings • What the data suggests overall • Uncertainties or next steps

Collaboration Reflection Discuss: • What went well in your workflow • What was challenging • Lessons learned for future projects • How GitHub, Jira, and RStudio supported collaboration

GitHub Repository Checklist Confirm your repo includes: • Clean README • Data cleaning scripts • EDA scripts (.qmd or .Rmd) • Clear folder structure (data/, R/, figs/) • Kanban snapshot • Evidence of teamwork (commit history, pull requests, etc.)